laion-ai/deep-image-diffusion-prior

Generate an image using text by visualizing CLIP features.

Public
1.1K runs

Input

string
Shift + Return to add a new line

Prompt to generate

Default: ""

string

Offset type

Default: "none"

integer
(minimum: 1, maximum: 10)

Number of scales

Default: 6

number
(minimum: 0, maximum: 1)

Strength of input noise

Default: 0

number
(minimum: 0, maximum: 10)

Learning rate

Default: 0.001

number
(minimum: 0, maximum: 10)

Learning rate factor for offset

Default: 1

number
(minimum: 0, maximum: 1)

Learning rate decay

Default: 0.995

number
(minimum: 0, maximum: 1)

Strength of parameter noise

Default: 0

integer
(minimum: 0, maximum: 100)

Display frequency

Default: 25

integer
(minimum: 0, maximum: 1000)

Number of iterations

Default: 250

integer
(minimum: 1, maximum: 10)

Number of samples per batch

Default: 2

integer
(minimum: 4, maximum: 32)

Number of cutouts

Default: 8

number
(minimum: 0, maximum: 10)

Scale of conditioning

Default: 1

integer
(minimum: -1, maximum: 100000)

Random seed

Default: -1

Output

output
Generated in

Run time and cost

This model costs approximately $0.080 to run on Replicate, or 12 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 6 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Deep Image Diffusion Prior

by @nousr

Model description

Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.

Acknowledgements

Code and weights by @nousr, with help from:

  • LAION for support, resources, and community

  • @RiversHaveWings for making me aware of this technique

  • Stability AI for compute which makes these models possible

  • lucidrains for spearheading the open-source replication of DALLE 2

Just to avoid any confusion, this research is a recreation of (one part of) OpenAI’s DALLE2 paper. It is not, “DALLE2”, the product/service from OpenAI you may have seen on the web.

Intended use

See the world “through CLIP’s eyes” by taking advantage of the diffusion prior as replicated by Laion to invert CLIP “ViT-L/14” text embeds to image embeds (as in unCLIP/DALLE2). After, a process known as deep-image-prior developed by Katherine Crowson is run to visualize the features in CLIP’s weights corresponding to activations from your prompt.

Caveats and recommendations

These visualizations can be quite abstract compared to other text-2-image models. However, you can often find a sort of dream like quality due to this. Many outputs are artistically fantastic because of this, but whether or not the visual matches your prompt as often is another matter.

gemini-2.5-flash-image

google/gemini-2.5-flash-image

Generate images from a text prompt. Use Google’s Gemini 2.5 Flash Image to synthesize imaginative or photo-realistic visuals, follow multi-step composition instructions, and maintain style or character consistency across prompts for iterative creative workflows. Outputs a single image per prompt and embeds a SynthID provenance watermark.

48.4k runs
Official
deepfloyd-if

andreasjansson/deepfloyd-if

Generate images from a text prompt. Accepts a prompt and optional seed and returns a single synthesized image. Uses DeepFloyd IF text-to-image diffusion to produce photo-realistic scenes, illustrations, cartoons, or stylized art from descriptive prompts. Licensed for non-commercial research use.

2.0m runs
style-transfer

fofr/style-transfer

Transfer the artistic style of a reference image to images generated from a text prompt. Optionally guide composition with a structure image; a depth-based ControlNet preserves layout and aspect ratio, with denoising controls to balance original colors and structure. Choose presets fast, high-quality, realistic, cinematic, or animated. Configure size, seed, batch count, negative prompts, and output format/quality. Outputs still images.

1.1m runs
text2image

pixray/text2image

Generate images from a text prompt. Control rendering with Pixray drawers: vqgan, vdiff, clipdraw, pixel, fast_pixel, line_sketch, and fft for GAN, diffusion, drawing, and pixel-art styles. Optionally pass extra settings in name: value format to fine-tune style and composition. Outputs one or more images.

1.4m runs
flux-texture-abstract-painting

brunnolou/flux-texture-abstract-painting

Generate abstract, textured painting images from text prompts, or stylize existing images and inpaint masked regions. Apply an abstract fine-art aesthetic with painterly textures and modern art sensibilities. Supports a fast generation mode and optional combination with additional LoRA weights for style mixing. The trigger word for this LoRA fine-tune is 'TXTUR'.

2.5k runs
midjourney-diffusion

tstramer/midjourney-diffusion

Generate stylized images from text prompts in a Midjourney-like diffusion aesthetic. Produce concept art, character portraits, posters, sci‑fi and fantasy scenes with illustrative detail. Customize resolution (up to 1024×768), number of outputs, sampling scheduler, steps, and seed.

1.6m runs
ongo

laion-ai/ongo

Generate paintings from text prompts. Optionally transform an input image or inpaint specific regions using an init image and a mask. Steer outputs with negative prompts and CLIP aesthetic guidance via aesthetic_rating and aesthetic_weight. Produce one or more images at small resolutions (128–384 px) with reproducible seeds.

133.6k runs
kolors

asiryan/kolors

Generate images from text prompts or transform an input image with text guidance. Accept a prompt and optional source image, and output one or more images at configurable sizes. Adjust prompt strength to control how much the source image is preserved, and use seed and steps for reproducibility and variation.

21.6k runs