This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 16 seconds. The predict time for this model varies significantly based on the inputs.


We used AltCLIP-m9, and trained a bilingual Diffusion model based on Stable Diffusion, with training data from WuDao dataset and LAION.

Our model performs well in aligning multilanguage and is the strongest open-source version on the market today, retaining most of the stable diffusion capabilities of the original, and in some cases even better than the original model.

AltDiffusion-m9 model is backed by a multilingual CLIP model named AltCLIP-m9, which is also accessible in FlagAI. You can read this tutorial for more information.

Support English, Chinese, Spanish, French, Russian, Japanese, Korean, Arabic and Italian.