Readme
Autoregressive Video Generation without Vector Quantization
This is the text2image demo, see text2video demo here.
We present NOVA (NOn-Quantized Video Autoregressive Model), a model that enables autoregressive image/video generation with high efficiency. NOVA reformulates the video generation problem as non-quantized autoregressive modeling of temporal frame-by-frame prediction and spatial set-by-set prediction. NOVA generalizes well and enables diverse zero-shot generation abilities in one unified model.
✨Hightlights
- 🔥 Novel Approach: Non-quantized video autoregressive generation.
- 🔥 State-of-the-art Performance: High efficiency with state-of-the-art t2i/t2v results.
- 🔥 Unified Modeling: Multi-task capabilities in a single unified model.
Citation
If you find this repository useful, please consider giving a star ⭐ and citation 🦖:
@article{deng2024nova,
title={Autoregressive Video Generation without Vector Quantization},
author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},
journal={arXiv preprint arXiv:2412.14169},
year={2024}
}
Acknowledgement
We thank the repositories: MAE, MAR, MaskGIT, DiT, Open-Sora-Plan, CogVideo, and CodeWithGPU.
License
Code and models are licensed under Apache License 2.0.