chenxwh / cogview3

Finer and Faster Text-to-Image Generation via Relay Diffusion

  • Public
  • 44 runs
  • GitHub
  • Weights
  • Paper
  • License
Iterate in playground
Run with an API

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

CogView3 & CogView-3Plus

Model Introduction

CogView-3-Plus builds upon CogView3 (ECCV‘24) by introducing the latest DiT framework for further overall performance improvements. CogView-3-Plus uses the Zero-SNR diffusion noise scheduling and incorporates a joint text-image attention mechanism. Compared to the commonly used MMDiT structure, it effectively reduces training and inference costs while maintaining the model’s basic capabilities. CogView-3Plus utilizes a VAE with a latent dimension of 16.

Citation

🌟 If you find our work helpful, feel free to cite our paper and leave a star.

@article{zheng2024cogview3,
  title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
  author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
  journal={arXiv preprint arXiv:2403.05121},
  year={2024}
}