chenxwh / nova-t2i

Autoregressive Image Generation without Vector Quantization (Updated 5 months, 3 weeks ago)

  • Public
  • 15 runs
  • GitHub
  • Weights
  • Paper
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Input prompt

Default: "a shiba inu wearing a beret and black turtleneck."

string
Shift + Return to add a new line

Specify things to not see in the output

Default: "low quality, deformed, distorted, disfigured, fused fingers, bad anatomy, weird hand"

integer
(minimum: 1, maximum: 128)

Number of inference steps

Default: 64

integer
(minimum: 1, maximum: 50)

Number of diffusion steps

Default: 25

number
(minimum: 1, maximum: 10)

Scale for classifier-free guidance

Default: 5

integer

Random seed. Leave blank to randomize the seed

Output

output
Generated in

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Autoregressive Video Generation without Vector Quantization

This is the text2image demo, see text2video demo here.

We present NOVA (NOn-Quantized Video Autoregressive Model), a model that enables autoregressive image/video generation with high efficiency. NOVA reformulates the video generation problem as non-quantized autoregressive modeling of temporal frame-by-frame prediction and spatial set-by-set prediction. NOVA generalizes well and enables diverse zero-shot generation abilities in one unified model.

✨Hightlights

  • πŸ”₯ Novel Approach: Non-quantized video autoregressive generation.
  • πŸ”₯ State-of-the-art Performance: High efficiency with state-of-the-art t2i/t2v results.
  • πŸ”₯ Unified Modeling: Multi-task capabilities in a single unified model.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation πŸ¦–:

@article{deng2024nova,
  title={Autoregressive Video Generation without Vector Quantization},
  author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},
  journal={arXiv preprint arXiv:2412.14169},
  year={2024}
}

Acknowledgement

We thank the repositories: MAE, MAR, MaskGIT, DiT, Open-Sora-Plan, CogVideo, and CodeWithGPU.

License

Code and models are licensed under Apache License 2.0.