ali-vilab / i2vgen-xl

RESEARCH/NON-COMMERCIAL USE ONLY: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

  • Public
  • 54.2K runs
  • GitHub
  • Paper

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

VGen

figure1

VGen is an open-source video synthesis codebase developed by the Tongyi Lab of Alibaba Group, featuring state-of-the-art video generative models. This repository includes implementations of the following methods:

I2VGen-xl: High-quality image-to-video synthesis via cascaded diffusion models

BibTeX

If this repo is useful to you, please cite our corresponding technical paper.

@article{2023i2vgenxl,
  title={I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models},
  author={Zhang, Shiwei and Wang, Jiayu and Zhang, Yingya and Zhao, Kang and Yuan, Hangjie and Qing, Zhiwu and Wang, Xiang  and Zhao, Deli and Zhou, Jingren},
  booktitle={arXiv preprint arXiv:2311.04145},
  year={2023}
}

Acknowledgement

We would like to express our gratitude for the contributions of several previous works to the development of VGen. This includes, but is not limited to Composer, ModelScopeT2V, Stable Diffusion, OpenCLIP, WebVid-10M, LAION-400M, Pidinet and MiDaS. We are committed to building upon these foundations in a way that respects their original contributions.

Disclaimer

This open-source model is trained with using WebVid-10M and LAION-400M datasets and is intended for RESEARCH/NON-COMMERCIAL USE ONLY.