This model runs predictions on Nvidia A100 GPU hardware.
80% of predictions complete within 12 minutes. The predict time for this model varies significantly based on the inputs.
Generate vivid Images for Any (Chinese / English) text
CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).
@article{ding2022cogview2,
title={CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers},
author={Ding, Ming and Zheng, Wendi and Hong, Wenyi and Tang, Jie},
journal={arXiv preprint arXiv:2204.14217},
year={2022}
}