How to use Stable Diffusion

Let's cover the basics. These are the parameters you'll see when using Stable Diffusion:

prompt
negative prompt
width and height
steps (or number of inference steps)
guidance scale (classifier-free guidance scale, or CFG scale)
seed
scheduler (or sampler)
batch size

Prompt

The most important parameter is the prompt. It's a text prompt that tells the model what image to generate.

Generally you should:

use comma separated terms (do not prompt Stable Diffusion like you would talk to ChatGPT)
put the most important thing first
keep your prompt within 75 tokens (or about 60 words)

Example prompts:

a photo of a cat, photography, studio portrait, 50mm lens
an oil painting of a cat, abstract, impressionism, 1920s

Input

prompt

a photo of a cat, photography, studio portrait, 50mm lens

width

768

height

768

seed

31300

Output

Tweak it

Negative prompt

A negative prompt is a list of all the things you don't want in your image. Write it in the same way you would a normal prompt: comma separated terms, with the most important thing first.

If you're asking for a 'photo of a cat', and you're getting back results that look like paintings or illustrations instead of a photo, put 'painting, illustration' in your negative prompt.

Let’s take our previous cat example and add a negative prompt to see the difference. Imagine that we didn’t want a brown cat, and we didn't want a cat that looked so serious, so we'll use the negative prompt: 'brown cat, serious':

Input

prompt

a photo of a cat, photography, studio portrait, 50mm lens

negative_prompt

brown cat, serious

width

768

height

768

seed

31300

Output

Tweak it

Width and height

For Stable Diffusion 1.5, outputs are optimised around 512x512 pixels. Many common fine-tuned versions of SD1.5 are optimised around 768x768.

The best resolutions for common aspect ratios are typically:

1:1 (square): 512x512, 768x768
3:2 (landscape): 768x512
2:3 (portrait): 512x768
4:3 (landscape): 768x576
3:4 (portrait): 576x768
16:9 (widescreen): 912x512
9:16 (tall): 512x912

For SDXL, outputs are optimised around 1024x1024 pixels. The best resolutions for common aspect ratios are typically:

1:1 (square): 1024x1024, 768x768
3:2 (landscape): 1152x768
2:3 (portrait): 768x1152
4:3 (landscape): 1152x864
3:4 (portrait): 864x1152
16:9 (widescreen): 1360x768
9:16 (tall): 768x1360

Width and height must be divisible by 8.

If you want to generate images larger than this, we recommend using an upscaler.
View our collection of upscaling models.

Number of inference steps

This is the number of steps the model will take to generate your image.

A larger number of steps increases the quality of the output but it takes longer to generate.

For vanilla SDXL and Stable Diffusion 1.5 you should start with a value of about 20 steps. Don’t go too high though, because after a point each step helps less and less. 50 steps is a good maximum.

Recent models like SDXL Turbo and SD Turbo can generate high quality images in just a single step, making them exceptionally fast.

Seed

A seed is a number used to initialize randomness for the model. By setting a seed, you can get the same output every time.

If you find an image you like but want to tweak it or improve quality, you can use the same seed and change other parameters.

For example, keep the same seed and:

increase the number of steps to improve quality
tweak the prompt to tweak the image
experiment with guidance scale

If you have a fixed seed but change the width or height of the image, then you will not see consistent results.

Guidance scale

The guidance scale tells the model how similar the output should be to the prompt. Start with a value of 7 or 7.5.

If your outputs aren’t matching your prompt as much as you’d like, try increasing this number. If you want the AI to be more creative, lower it.

SDXL Turbo and SD Turbo do not use a guidance scale. If the tool you use has an option for it, set it to 0.

Scheduler (or sampler)

Choosing a scheduler is an advanced parameter. They play a critical role in determining how the noise is incrementally reduced (denoising) to form the final output.

Many users will have a favorite scheduler and stick with it. Most schedulers give similar results, but some can sample faster, while others can get good results in fewer steps.

We recommend starting with Euler or Euler ancestral (EULER_A), a good scheduler for both SD1.5 and SDXL.

DPM++ 2M Karras is a popular choice among users of AUTOMATIC1111, which is a user interface tool for Stable Diffusion.

Batch size

This is the number of images that will be generated at once. A larger batch size needs more memory, but time per image is usually reduced.

The number of images you can batch at once is limited by the memory available.

On Replicate, SDXL image generation is limited to batches of 4.

Newer models that produce images in fewer steps also use less memory, and can batch more images at once.