Readme
Deep Image Diffusion Prior
by @nousr
Model description
Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.
Acknowledgements
Code and weights by @nousr, with help from:
-
LAION for support, resources, and community
-
@RiversHaveWings for making me aware of this technique
-
Stability AI for compute which makes these models possible
-
lucidrains for spearheading the open-source replication of DALLE 2
Just to avoid any confusion, this research is a recreation of (one part of) OpenAI’s DALLE2 paper. It is not, “DALLE2”, the product/service from OpenAI you may have seen on the web.
Intended use
See the world “through CLIP’s eyes” by taking advantage of the diffusion prior
as replicated by Laion to invert CLIP “ViT-L/14” text embeds to image embeds (as in unCLIP/DALLE2). After, a process known as deep-image-prior
developed by Katherine Crowson is run to visualize the features in CLIP’s weights corresponding to activations from your prompt.
Caveats and recommendations
These visualizations can be quite abstract compared to other text-2-image models. However, you can often find a sort of dream like quality due to this. Many outputs are artistically fantastic because of this, but whether or not the visual matches your prompt as often is another matter.