Fast Stable Diffusion: Turbo and latent consistency models (LCMs)

You might have noticed that Stable Diffusion is now fast. Very fast. 8 frames per second, then 70fps, now reports of over 100fps, on consumer hardware.

Today you can do realtime image-to-image painting, and write prompts that return images before you’re done typing. There’s a whole new suite of applications for generative imagery. And other models, like text-to-video, will soon apply these techniques too.

In this guide we’ll aim to explain the different models, terms used and some of the techniques that are powering this speed-up.

Let’s start with latent consistency models, or LCMs. They started it all, and they brought about the first realtime painting apps, like Krea AI popularised.

Latent consistency models (LCM)

In October 2023, Simian Luo et al. released the first latent consistency model and we blogged about how to run it on a Mac and make 1 image per second. You can also run it on Replicate.

What made it so fast is that it needs only 4 to 8 steps to make a good image. This compares with 20 to 30 for regular Stable Diffusion. LCMs can do this because they are designed to directly predict the reverse diffusion outcome in latent space. Essentially, this means they get to the good pictures faster. Dig into their research paper if you want to learn more.

The latent consistency model that was released, the one based on a Stable Diffusion 1.5 finetune called Dreamshaper, is very fast. But what about other models? How can we speed them up?

One of the big drawbacks with LCMs is the way these models are made – in a process called distillation. The LCM must be distilled from a pre-trained text to image model. Distillation is a training process that needs ~650K text-image pairs and 32 hours of A100 compute. The authors distillation code also hasn’t been released.

LoRAs to the rescue.

LCM LoRAs

In November 2023, a new innovation followed – Huggingface blogged about a way to get all the benefits of LCM for any Stable Diffusion model or fine-tune, without needing to train a new distilled model. Full details are in their paper.

They did this by training an “LCM LoRA”. A much faster training process that can capture the speed-ups of LCM by training just a few LoRA layers.

You can experiment with these LoRAs using models on Replicate:

Suddenly we have very fast 4 step inference for SDXL, SD 1.5, and all the fine tunes that go with them. On a Mac, where a 1024x1024 image would take upwards of 1 minute, it’s now ready in just 6 seconds.

There is a downside though. The images these LCM LoRAs generate are lower quality, and need more specific prompting to get good results.

You also need to use a guidance scale of 0 (ie turning it off) or between 1 and 2. Go outside of these ranges and your images will look terrible.

Wait, aren’t LoRAs used for styles and object fine-tuning?

Yes. They both use the same LoRA approach, but to achieve different goals. One changes an image style, the other makes images faster.

LoRA (Low-Rank-Adaptation) is a general technique that accelerates the fine-tuning of models by training smaller matrices. These then get loaded into an unchanged base model to apply their affect.

SDXL Turbo and SD Turbo

On November 28 2023 Stability AI announced “SDXL Turbo” (and more quietly its partner, SD Turbo). Now SDXL images can be made in just 1 step. Down from 50 steps, and also down from LCM LoRA’s 4 steps.

On an A100, SDXL Turbo generates a 512x512 image in 207ms (prompt encoding + a single denoising step + decoding, fp16), where 67ms are accounted for by a single UNet forward evaluation.

Stability AI achieved this using a new technique called Adversarial Diffusion Distillation (ADD). The adversarial aspect is worth highlighting. During training the model aims to fool a discriminator into thinking its generations are real images. This forces the model to generate real looking images at every step, without any of the common AI distortions or blurriness you get with early steps in traditional diffusion models.

Read the full research paper, where the distillation process is also explained.

You can download the turbo models from Huggingface:

They can only be used for research purposes.

Community optimisation

The generative AI community is already optimising the LCM and turbo models to achieve phenomenal speeds.

X user cumulo_autumn has demonstrated 40fps imagery using LCM. Meanwhile Dan Wood has achieved 77fps with SD Turbo and 167 images per second when using batches of 12. All of these are on consumer 4090 GPUs.