cjwbw / text2video-zero

Text-to-Image Diffusion Models are Zero-Shot Video Generators

  • Public
  • 41.7K runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License

Input

string
Shift + Return to add a new line

choose your model, the model should be avaliable on HF

Default: "dreamlike-art/dreamlike-photoreal-2.0"

string
Shift + Return to add a new line

Input Prompt

Default: "A horse galloping on a street"

string
Shift + Return to add a new line

Negative Prompt

Default: ""

integer

Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1

Default: 44

integer

Perform DDPM steps from t0 to t1. The larger the gap between t0 and t1, the more variance between the frames. Ensure t0 < t1

Default: 47

integer
(minimum: -20, maximum: 20)

Default: 12

integer
(minimum: -20, maximum: 20)

Default: 12

integer

Video length in seconds

Default: 20

integer

video frames per second

Default: 4

integer

Random seed. Leave blank to randomize the seed

Output

Generated in

Run time and cost

This model costs approximately $0.14 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 99 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Text2Video-Zero

Official code for Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators*
Levon Khachatryan, Andranik Movsisyan, Vahram Tadevosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, Humphrey Shi


Our method Text2Video-Zero enables zero-shot video generation using (i) a textual prompt (see rows 1, 2), (ii) a prompt combined with guidance from poses or edges (see lower right), and (iii) Video Instruct-Pix2Pix, i.e., instruction-guided video editing (see lower left). Results are temporally consistent and follow closely the guidance and textual prompts.

License

The code is published under the CreativeML Open RAIL-M license. The license provided in this repository applies to all additions and contributions we make upon the original stable diffusion code. The original stable diffusion code is under the CreativeML Open RAIL-M license, which can found here.

BibTeX

If you use our work in your research, please cite our publication:

@article{text2video-zero,
    title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},
    author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
    journal={arXiv preprint arXiv:2303.13439},
    year={2023}
}