Run a model from Node.js

Run a model from Google Colab

Run a model from Python

Fine-tune an image model

Best practices for Replicate models

Build a website with Next.js

Build a Discord bot with Python

Build an app with SwiftUI

Cache images with Cloudflare

Use realtime speech with OpenAI

Push your own model

Push a Diffusers model

Push a Transformers model

Handle webhooks with Val Town

Deploy a custom model

Push a model using GitHub Actions

Set up a CI/CD pipeline

Get a GPU on Brev

Get a GPU on Lambda Labs

Working with LoRAs

Make art with Stable Diffusion

Upscale images with AI models

Get started with ComfyUI

How does Replicate work?

Client libraries

Home / Topics / Predictions

Rate limits

We limit the number of API requests that can be made to Replicate:

You can create predictions at 600 requests per minute.
All other endpoints you can call at 3000 requests per minute.

If you hit a limit, you will receive a response with status 429 with a body like:

{"detail":"Request was throttled. Expected available in 1 second."}

If you want higher limits, contact us.

Next: Safety checking