Generate images from a text prompt
50.9K runs

Run time and cost

Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 68 seconds. The predict time for this model varies significantly based on the inputs.


Generate images from a text prompt

Our logo was generated with DALL·E mini using the prompt "logo of an armchair in the shape of an avocado".

Citing DALL·E mini

If you find DALL·E mini useful in your research or wish to refer, please use the following BibTeX entry.

      author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
      doi = {10.5281/zenodo.5146400},
      month = {7},
      title = {DALL·E Mini},
      url = {},
      year = {2021}


Original DALL·E from "Zero-Shot Text-to-Image Generation" with image quantization from "Learning Transferable Visual Models From Natural Language Supervision".

Image encoder from "Taming Transformers for High-Resolution Image Synthesis".

Sequence to sequence model based on "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension" with implementation of a few variants:

Main optimizer (Distributed Shampoo) from "Scalable Second Order Optimization for Deep Learning".