Torch compile caching – Replicate changelog

torch.compile can speed up your inference time significantly, but at the cost of slower startup times. We’ve implemented caching of torch.compile artifacts across model instances to help your models boot faster.

Models using torch.compile like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster.

In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one, making torch.compile an important feature to explore.

For more details, check out the blog post. If you’re building your own custom models, check out our guide to improving model performance with torch.compile.

To learn more about how to use torch.compile, check out the official PyTorch torch.compile tutorial.