Generate vivid images for any (Chinese / English) text
15K runs

Run time and cost

Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 14 minutes. The predict time for this model varies significantly based on the inputs.

Generate vivid Images for Any (Chinese / English) text

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).

  title={CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers},
  author={Ding, Ming and Zheng, Wendi and Hong, Wenyi and Tang, Jie},
  journal={arXiv preprint arXiv:2204.14217},