Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 130 seconds. The predict time for this model varies significantly based on the inputs.



Implementation of Vchitect/SEINE Image-To-Video model


title={SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction},
author={Chen, Xinyuan and Wang, Yaohui and Zhang, Lingjun and Zhuang, Shaobin and Ma, Xin and Yu, Jiashuo and Wang, Yali and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
journal={arXiv preprint arXiv:2310.20700},


We disclaim responsibility for user-generated content. The model was not trained to realistically represent people or events, so using it to generate such content is beyond the model’s capabilities. It is prohibited for pornographic, violent and bloody content generation, and to generate content that is demeaning or harmful to people or their environment, culture, religion, etc. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for users’ behaviors. Use the generative model responsibly, adhering to ethical and legal standards.

Contact Us

Xinyuan Chen: Yaohui Wang:


The code is built upon diffusers and Stable Diffusion, we thank all the contributors for open-sourcing.


The code is licensed under Apache-2.0, model weights are fully open for academic research and also allow free commercial usage.