Readme

Input Parameter Descriptions

Basic

text: For long prompts, only the first 64 tokens will be used to generate the image.
save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.

Advanced

temperature: High temperature increases the probability of sampling low scoring image tokens.
top_k: Each image token is sampled from the top-k scoring tokens.

Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.

Expert

supercondition_factor: Higher values can result in better agreement with the text. Let logits_cond be the logits computed from the text prompt and logits_uncond be the logits computed from an empty text prompt, and let a be the super-condition factor, then logits = logits_cond * a + logits_uncond * (1 - a)

Example

Consider the images generated for “panda with top hat reading a book” with different settings.

text = "panda with top hat reading a book"
temperature = 0.5
top_k = 128
supercondition_factor = 4

min-dalle

text = "panda with top hat reading a book"
temperature = 4
top_k = 64
supercondition_factor = 16