cjwbw / lavie

High-Quality Video Generation with Cascaded Latent Diffusion Models

  • Public
  • 12.6K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 62 seconds. The predict time for this model varies significantly based on the inputs.

Readme

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

LaVie is a Text-to-Video (T2V) generation framework, and main part of video generation system Vchitect.

BibTex

@article{wang2023lavie,
  title={LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models},
  author={Wang, Yaohui and Chen, Xinyuan and Ma, Xin and Zhou, Shangchen and Huang, Ziqi and Wang, Yi and Yang, Ceyuan and He, Yinan and Yu, Jiashuo and Yang, Peiqing and others},
  journal={arXiv preprint arXiv:2309.15103},
  year={2023}
}

Disclaimer

We disclaim responsibility for user-generated content. The model was not trained to realistically represent people or events, so using it to generate such content is beyond the model’s capabilities. It is prohibited for pornographic, violent and bloody content generation, and to generate content that is demeaning or harmful to people or their environment, culture, religion, etc. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for users’ behaviors. Use the generative model responsibly, adhering to ethical and legal standards.

Contact Us

Yaohui Wang: wangyaohui@pjlab.org.cn
Xinyuan Chen: chenxinyuan@pjlab.org.cn
Xin Ma: xin.ma1@monash.edu

Acknowledgements

The code is built upon diffusers and Stable Diffusion, we thank all the contributors for open-sourcing.

License

The code is licensed under Apache-2.0, model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please contact vchitect@pjlab.org.cn.