This model runs predictions on Nvidia A100 GPU hardware.
80% of predictions complete within 30 seconds.
text: For long prompts, only the first 64 tokens will be used to generate the image.
save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.
temperature: High temperature increases the probability of sampling low scoring image tokens.
top_k: Each image token is sampled from the top-k scoring tokens.
Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.
supercondition_factor: Higher values can result in better agreement with the text. Let
logits_condbe the logits computed from the text prompt and
logits_uncondbe the logits computed from an empty text prompt, and let
abe the super-condition factor, then
logits = logits_cond * a + logits_uncond * (1 - a)
Consider the images generated for "panda with top hat reading a book" with different settings.
text = "panda with top hat reading a book" temperature = 0.5 top_k = 128 supercondition_factor = 4
text = "panda with top hat reading a book" temperature = 4 top_k = 64 supercondition_factor = 16
Credit to @AnnasVirtual for the example.