chenxwh / karlo

Text-conditional image generation model based on OpenAI's unCLIP

  • Public
  • 1K runs
  • GitHub



Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 59 seconds. The predict time for this model varies significantly based on the inputs.


Karlo is a text-conditional image generation model based on OpenAI’s unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details only in the small number of denoising steps.

This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M. For those who are interested in a better version of Karlo trained on more large-scale high-quality datasets, please visit the landing page of our application B^DISCOVER.

Model Architecture


Karlo is a text-conditional diffusion model based on unCLIP, composed of prior, decoder, and super-resolution modules. In this repository, we include the improved version of the standard super-resolution module for upscaling 64px to 256px only in 7 reverse steps, as illustrated in the figure below:

In specific, the standard SR module trained by DDPM objective upscales 64px to 256px in the first 6 denoising steps based on the respacing technique. Then, the additional fine-tuned SR module trained by VQ-GAN-style loss performs the final reverse step to recover high-frequency details. We observe that this approach is very effective to upscale the low-resolution in a small number of reverse steps.

License and Disclaimer

This project including the weights are distributed under CreativeML Open RAIL-M license, equivalent version of Stable Diffusion v1. You may use this model in commercial applications, but it is highly recommended to adopt a powerful safe checker as a post-processing. We also remark that we are not responsible for any kinds of use of the generated images.


If you find this repository useful in your research, please cite:

  title         = {Karlo-v1.0.alpha on COYO-100M and CC15M},
  author        = {Donghoon Lee, Jiseob Kim, Jisu Choi, Jongmin Kim, Minwoo Byeon, Woonhyuk Baek and Saehoon Kim},
  year          = {2022},
  howpublished  = {\url{}},



If you would like to collaborate with us or share a feedback, please e-mail to us,