Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
This is a custom stable diffusion pipeline modified to enable prompting with longer prompts, up to 231 tokens (as contrasted with the 77 tokens for ordinary stable diffusion). Source code for this implementation can be found here. Implementation by SkyTNT.
Prompt weighting is also supported: - Emphasize/weigh part of your prompt with parentheses as so: a baby deer with (big eyes) - De-emphasize part of your prompt as so: a [baby] deer with big eyes - Precisely weigh part of your prompt as so: a baby deer with (big eyes:1.3)
Prompt weighting equivalents: - a baby deer with == (a baby deer with:1.0) - (big eyes) == (big eyes:1.1) - ((big eyes)) == (big eyes:1.21) - [big eyes] == (big eyes:0.91)
For an in depth Stable Diffusion model card, see the official Replicate implementation of Stable Diffusion.
- Developed by: Robin Rombach, Patrick Esser
- Model type: Diffusion-based text-to-image generation model
- Language(s): English
- License: The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
- Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper.
- Resources for more information: GitHub Repository, Paper.
- Cite as:
``` @InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj"orn}, title = {High-Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684-10695} }