Readme
DALL·E Mini
Generate images from a text prompt
Our logo was generated with DALL·E mini using the prompt “logo of an armchair in the shape of an avocado”.
Citing DALL·E mini
If you find DALL·E mini useful in your research or wish to refer, please use the following BibTeX entry.
@misc{Dayma_DALL·E_Mini_2021,
author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
doi = {10.5281/zenodo.5146400},
month = {7},
title = {DALL·E Mini},
url = {https://github.com/borisdayma/dalle-mini},
year = {2021}
}
References
Original DALL·E from “Zero-Shot Text-to-Image Generation” with image quantization from “Learning Transferable Visual Models From Natural Language Supervision”.
Image encoder from “Taming Transformers for High-Resolution Image Synthesis”.
Sequence to sequence model based on “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” with implementation of a few variants:
- “GLU Variants Improve Transformer“
- “Deepnet: Scaling Transformers to 1,000 Layers“
- “NormFormer: Improved Transformer Pretraining with Extra Normalization“
- “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows“
- “CogView: Mastering Text-to-Image Generation via Transformers“
- “Root Mean Square Layer Normalization“
- “Sinkformers: Transformers with Doubly Stochastic Attention“
Main optimizer (Distributed Shampoo) from “Scalable Second Order Optimization for Deep Learning”.