cjwbw / controlvideo

Training-free Controllable Text-to-Video Generation

  • Public
  • 2.4K runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Text description of target video

Default: "A striking mallard floats effortlessly on the sparkling pond."

*file

source video

string

Condition of structure sequence

Default: "depth"

integer

Length of synthesized video

Default: 15

string
Shift + Return to add a new line

Timesteps at which using interleaved-frame smoother, separate with comma

Default: "19, 20"

boolean

Whether to use hierarchical sampler to produce long video

Default: false

integer

Number of denoising steps

Default: 50

number
(minimum: 1, maximum: 20)

Scale for classifier-free guidance

Default: 12.5

string
Shift + Return to add a new line

Random seed. Leave blank to randomize the seed

Output

Generated in

Run time and cost

This model costs approximately $0.016 to run on Replicate, or 62 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 12 seconds.

Readme

ControlVideo

Official PyTorch implementation of “ControlVideo: Training-free Controllable Text-to-Video Generation”


ControlVideo adapts ControlNet to the video counterpart without any finetuning, aiming to directly inherit its high-quality and consistent generation

Citation

If you make use of our work, please cite our paper.

@article{zhang2023controlvideo,
  title={ControlVideo: Training-free Controllable Text-to-Video Generation},
  author={Zhang, Yabo and Wei, Yuxiang and Jiang, Dongsheng and Zhang, Xiaopeng and Zuo, Wangmeng and Tian, Qi},
  journal={arXiv preprint arXiv:2305.13077},
  year={2023}
}

Acknowledgement

This work repository borrows heavily from Diffusers, ControlNet, Tune-A-Video, and RIFE.

There are also many interesting works on video generation: Tune-A-Video, Text2Video-Zero, Follow-Your-Pose, Control-A-Video, et al.