FLUX is fast and it's open source

Posted October 10, 2024 by

FLUX is now much faster on Replicate, and we’ve made our optimizations open-source so you can see exactly how they work and build upon them.

Here are the end-to-end speeds:

  • FLUX.1 [schnell] at 512x512 and 4 steps: 0.29 seconds (P90: 0.49 seconds)
  • FLUX.1 [schnell] at 1024x1024 and 4 steps: 0.72 seconds (P90: 0.95 seconds)
  • FLUX.1 [dev] at 1024x1024 and 28 steps: 3.03 seconds (P90: 3.90 seconds)

This is from the west coast of the US using the Python client.

Here’s a demo of FLUX.1 [schnell]. (It's live, just start typing!)

Here's the full app, and source code, if you'd like to check it out.

How did we do it?

Most of the models on Replicate are contributed by our community, but we maintain the FLUX models in collaboration with Black Forest Labs.

We’ve done two main things to make FLUX faster:

  • We optimized the model. We used Alex Redden’s flux-fp8-api as a starting point, then optimized it with torch.compile and used fast CuDNN attention kernels in the nightly Torch builds.
  • We added a new synchronous HTTP API that makes all image models much faster on Replicate.

The quantization in flux-fp8-api slightly changes the output of the model, but we have found it has little impact on the quality.

We’ve created a tool that compares the output of thousands of prompts on FLUX.1 [schnell] and FLUX.1 [dev]. We're not cherry picking. Take a look for yourself.

You can disable this by setting the go_fast input on the model to false.

We want to be open with you about how we’re optimizing the models. It’s notoriously hard to compare output between models and providers, and it's often unclear whether providers are doing things that impact the quality of the model.

We’re just going to tell you how we did it and let you disable any optimizations. That means you’re not wondering whether the output you’re getting is the best quality it can be.

Most importantly, the code is open-source, so you can see exactly how it works: github.com/replicate/cog-flux

Open-source should be fast too

Open-source models are often slow out of the box. Model providers then optimize these models to make them fast and release them behind proprietary APIs, without contributing the improvements back to the community.

We want to change that. We think open-source should be fast too.

We’re open-sourcing all the improvements we make to FLUX. We’re also collaborating with the AI Compiler Study Group and other AI researchers to make an open-source fast version of FLUX.

Making the FLUX optimizations open-source is not just the right thing to do, it also means all the experts in the world can collaborate together to make it the fastest. Pull requests welcome.

It’s going to get faster

New techniques are coming out all the time to make models faster, and by collaborating with the community, you can be sure that they’re going to be on Replicate as fast as possible. Stay tuned.

Do more with FLUX

You can do more than just run FLUX on Replicate. You can:

Follow us on X to keep up to speed.