Join us at Uncanny Spaces, a series of talks about ML and creativity. 🚀

kuprel/min-dalle

Public
Fast, minimal port of DALL·E Mini to PyTorch
422,726 runs

Performance

This model runs predictions on Nvidia A100 GPU hardware.

80% of predictions complete within 30 seconds.

Readme

Colab

Hugging Face Spaces

Input Parameter Descriptions

Basic

  • text: For long prompts, only the first 64 tokens will be used to generate the image.
  • save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
  • progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
  • seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
  • grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.

Advanced

  • temperature: High temperature increases the probability of sampling low scoring image tokens.
  • top_k: Each image token is sampled from the top-k scoring tokens.

Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.

Expert

  • supercondition_factor: Higher values can result in better agreement with the text. Let logits_cond be the logits computed from the text prompt and logits_uncond be the logits computed from an empty text prompt, and let a be the super-condition factor, then logits = logits_cond * a + logits_uncond * (1 - a)

Example

Consider the images generated for "panda with top hat reading a book" with different settings.

text = "panda with top hat reading a book"
temperature = 0.5
top_k = 128
supercondition_factor = 4

min-dalle

text = "panda with top hat reading a book"
temperature = 4
top_k = 64
supercondition_factor = 16

min-dalle

Credit to @AnnasVirtual for the example.

❤️ Sponsor

Replicate