chenxwh / nova-t2i

Autoregressive Image Generation without Vector Quantization

  • Public
  • 13 runs
  • GitHub
  • Weights
  • Paper
  • License

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Autoregressive Video Generation without Vector Quantization

This is the text2image demo, see text2video demo here.

We present NOVA (NOn-Quantized Video Autoregressive Model), a model that enables autoregressive image/video generation with high efficiency. NOVA reformulates the video generation problem as non-quantized autoregressive modeling of temporal frame-by-frame prediction and spatial set-by-set prediction. NOVA generalizes well and enables diverse zero-shot generation abilities in one unified model.

✨Hightlights

  • 🔥 Novel Approach: Non-quantized video autoregressive generation.
  • 🔥 State-of-the-art Performance: High efficiency with state-of-the-art t2i/t2v results.
  • 🔥 Unified Modeling: Multi-task capabilities in a single unified model.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{deng2024nova,
  title={Autoregressive Video Generation without Vector Quantization},
  author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},
  journal={arXiv preprint arXiv:2412.14169},
  year={2024}
}

Acknowledgement

We thank the repositories: MAE, MAR, MaskGIT, DiT, Open-Sora-Plan, CogVideo, and CodeWithGPU.

License

Code and models are licensed under Apache License 2.0.