kuprel / min-dalle

Fast, minimal port of DALL·E Mini to PyTorch

  • Public
  • 505.5K runs
  • A100 (80GB)
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Default: "Dali painting of WALL·E"

boolean

Default: false

boolean

Default: true

boolean

Default: false

integer
(minimum: 1, maximum: 9)

Default: 5

number
(minimum: 0.01, maximum: 16)

Default: 4

integer

Advanced Setting, see Readme below if interested.

Default: 64

integer

Advanced Setting, see Readme below if interested.

Default: 16

Output

output
Generated in

This example was created by a different version, kuprel/min-dalle:888c72d6.

Run time and cost

This model costs approximately $0.0099 to run on Replicate, or 101 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 8 seconds.

Readme

Colab

Input Parameter Descriptions

Basic

  • text: For long prompts, only the first 64 tokens will be used to generate the image.
  • save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
  • progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
  • seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
  • grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.

Advanced

  • temperature: High temperature increases the probability of sampling low scoring image tokens.
  • top_k: Each image token is sampled from the top-k scoring tokens.

Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.

Expert

  • supercondition_factor: Higher values can result in better agreement with the text. Let logits_cond be the logits computed from the text prompt and logits_uncond be the logits computed from an empty text prompt, and let a be the super-condition factor, then logits = logits_cond * a + logits_uncond * (1 - a)

Example

Consider the images generated for “panda with top hat reading a book” with different settings.

text = "panda with top hat reading a book"
temperature = 0.5
top_k = 128
supercondition_factor = 4

min-dalle

text = "panda with top hat reading a book"
temperature = 4
top_k = 64
supercondition_factor = 16

min-dalle