chenxwh / lavie

High-Quality Video Generation with Cascaded Latent Diffusion Models

  • Public
  • 12.6K runs
  • GitHub
  • Paper
  • License


LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

LaVie is a Text-to-Video (T2V) generation framework, and main part of video generation system Vchitect.


  title={LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models},
  author={Wang, Yaohui and Chen, Xinyuan and Ma, Xin and Zhou, Shangchen and Huang, Ziqi and Wang, Yi and Yang, Ceyuan and He, Yinan and Yu, Jiashuo and Yang, Peiqing and others},
  journal={arXiv preprint arXiv:2309.15103},


We disclaim responsibility for user-generated content. The model was not trained to realistically represent people or events, so using it to generate such content is beyond the model’s capabilities. It is prohibited for pornographic, violent and bloody content generation, and to generate content that is demeaning or harmful to people or their environment, culture, religion, etc. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for users’ behaviors. Use the generative model responsibly, adhering to ethical and legal standards.

Contact Us

Yaohui Wang:
Xinyuan Chen:
Xin Ma:


The code is built upon diffusers and Stable Diffusion, we thank all the contributors for open-sourcing.


The code is licensed under Apache-2.0, model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please contact