CogView3 & CogView-3Plus
Model Introduction
CogView-3-Plus builds upon CogView3 (ECCV‘24) by introducing the latest DiT framework for further overall performance improvements. CogView-3-Plus uses the Zero-SNR diffusion noise scheduling and incorporates a joint text-image attention mechanism. Compared to the commonly used MMDiT structure, it effectively reduces training and inference costs while maintaining the model’s basic capabilities. CogView-3Plus utilizes a VAE with a latent dimension of 16.
Citation
🌟 If you find our work helpful, feel free to cite our paper and leave a star.
@article{zheng2024cogview3,
title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
journal={arXiv preprint arXiv:2403.05121},
year={2024}
}