Torch compile caching
torch.compile
can speed up your inference time significantly, but at the cost of slower startup times.
We’ve implemented caching of torch.compile
artifacts across model instances to help your models boot faster.
Models using torch.compile
like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster.
In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one,
making torch.compile
an important feature to explore.
For more details, check out the blog post. If you’re building your own custom models, check out our guide to improving model performance with torch.compile
.
To learn more about how to use torch.compile
, check out the official PyTorch torch.compile tutorial.