Generate vivid images for any (Chinese / English) text
15K runs

Run time and cost

Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 14 minutes. The predict time for this model varies significantly based on the inputs.

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).

