cjwbw / videocrafter

VideoCrafter2: Text-to-Video and Image-to-Video Generation and Editing

  • Public
  • 33.7K runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.13 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 92 seconds. The predict time for this model varies significantly based on the inputs.

Readme

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

🔆 Introduction

🤗🤗🤗 VideoCrafter is an open-source video generation and editing toolbox for crafting video content.
It currently includes the Text2Video and Image2Video models:

1. Generic Text-to-video Generation

Click the GIF to access the high-resolution video.

"A girl is looking at the camera smiling. High Definition." "an astronaut running away from a dust storm on the surface of the moon, the astronaut is running towards the camera, cinematic"
"A giant spaceship is landing on mars in the sunset. High Definition." "A blue unicorn flying over a mystical land"

2. Generic Image-to-video Generation

"a black swan swims on the pond" "a girl is riding a horse fast on grassland" "a boy sits on a chair facing the sea" "two galleons moving in the wind at sunset"

😉 Citation

The technical report is currently unavailable as it is still in preparation. You can cite the paper of our image-to-video model and related base model.

@misc{chen2024videocrafter2,
      title={VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models}, 
      author={Haoxin Chen and Yong Zhang and Xiaodong Cun and Menghan Xia and Xintao Wang and Chao Weng and Ying Shan},
      year={2024},
      eprint={2401.09047},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{chen2023videocrafter1,
      title={VideoCrafter1: Open Diffusion Models for High-Quality Video Generation}, 
      author={Haoxin Chen and Menghan Xia and Yingqing He and Yong Zhang and Xiaodong Cun and Shaoshu Yang and Jinbo Xing and Yaofang Liu and Qifeng Chen and Xintao Wang and Chao Weng and Ying Shan},
      year={2023},
      eprint={2310.19512},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@article{xing2023dynamicrafter,
      title={DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors}, 
      author={Jinbo Xing and Menghan Xia and Yong Zhang and Haoxin Chen and Xintao Wang and Tien-Tsin Wong and Ying Shan},
      year={2023},
      eprint={2310.12190},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@article{he2022lvdm,
      title={Latent Video Diffusion Models for High-Fidelity Long Video Generation}, 
      author={Yingqing He and Tianyu Yang and Yong Zhang and Ying Shan and Qifeng Chen},
      year={2022},
      eprint={2211.13221},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🤗 Acknowledgements

Our codebase builds on Stable Diffusion. Thanks the authors for sharing their awesome codebases!

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.