HTTP API reference

Contents

Authentication

All API requests must be authenticated with a token. Include this header with all requests:

Authorization: Token <token>

Rate limits

We limit the number of API requests that can be made to Replicate:

  • You can call create prediction at 10 requests per second (average) with a burst capacity of up to 600 requests.
  • All other endpoints you can call at 200 requests per second (average) with a burst capacity of up to 12000 requests.

Whenever your request rate is below your limit, you will accumulate burst capacity up to the stated maximum. For example, if your rate limit is 10 requests per second with a burst capacity of up to 600 requests, then any of the following scenarios would be allowed:

  • send 600 requests all at once; wait 60 seconds; send 600 requests all at once
  • send 600 requests all at once; send 5 requests per second for the next 120 seconds; send 600 requests all at once
  • send 10 requests per second continuously

If you hit a limit, you will receive a response with status 429 with a body like:

{"detail":"Request was throttled. Expected available in 1 second."}

The numbers you see above may already reflect custom limits we have set for you. If you want higher limits, email us at team@replicate.com.

Create a prediction

POST https://api.replicate.com/v1/predictions

Calling this endpoint starts a new prediction for the version and inputs you provide. As models can take several seconds or more to run, the output will not be available immediately. To get the final result of the prediction you should either provide a webhook URL for us to call when the results are ready, or poll the get a prediction endpoint until it has one of the terminated statuses.

The request body takes these parameters:

  • version: The ID of the model version that you want to run.

    You can get your model's versions using the API, or find them on the website by clicking the "Versions" tab on the Replicate model page, e.g. replicate.com/replicate/hello-world/versions, then copying the full SHA256 hash from the URL.

    The version ID is the same as the Docker image ID that's created when you build your model.
  • input: The model's input as a JSON object.

    The input depends on what model you are running. To see the available inputs, click the "Run with API" tab on the model you are running. For example, stability-ai/stable-diffusion takes prompt as an input.

    Files should be passed as data URLs or HTTP URLs.
  • webhook_completed: An HTTPS URL for receiving a webhook when the prediction has completed. The webhook will be a POST request where the request body is the same as the response body of the get prediction endpoint. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once.

For example:

{
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  }
}

Example curl request:

$ curl -s -X POST \
  -d '{"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \
  -H "Authorization: Token $REPLICATE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/predictions

The response is a JSON object in the following format:

{
  "id": "ufawqhfynnddngldkgtslldrkq",
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/ufawqhfynnddngldkgtslldrkq",
    "cancel": "https://api.replicate.com/v1/predictions/ufawqhfynnddngldkgtslldrkq/cancel"
  },
  "created_at": "2022-04-26T22:13:06.224088Z",
  "started_at": null,
  "completed_at": null,
  "status": "starting",
  "input": {
    "text": "Alice"
  },
  "output": null,
  "error": null,
  "logs": null,
  "metrics": {}
}

Get a prediction

GET https://api.replicate.com/v1/predictions/{prediction_id}

Returns the same response as create a prediction. status will be one of:

  • starting: the prediction is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the prediction.
  • processing: the predict() method of the model is currently running.
  • succeeded: the prediction completed successfully.
  • failed: the prediction encountered an error during processing.
  • canceled: the prediction was canceled by the user.

In the case of success, output will be an object containing the output of the model. Any files will be represented as URLs. You'll need to pass the `Authorization` header to request them.

In the case of failure, error will contain the error encountered during the prediction.

Terminated predictions (with a status of succeeded, failed or canceled) include a metrics object with a predict_time property showing the amount of CPU or GPU time, in seconds, that this prediction used while running. This is the time you're billed for, and it doesn't include time waiting for the prediction to start.

Example curl request:

$ curl -s \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/predictions/rrr4z55ocneqzikepnug6xezpe

The response is the prediction object:

{
  "id": "rrr4z55ocneqzikepnug6xezpe",
  "version": "be04660a5b93ef2aff61e3668dedb4cbeb14941e62a3fd5998364a32d613e35e",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/rrr4z55ocneqzikepnug6xezpe",
    "cancel": "https://api.replicate.com/v1/predictions/rrr4z55ocneqzikepnug6xezpe/cancel"
  },
  "created_at": "2022-09-13T22:54:18.578761Z",
  "started_at": "2022-09-13T22:54:19.438525Z",
  "completed_at": "2022-09-13T22:54:23.236610Z",
  "source": "api",
  "status": "succeeded",
  "input": {
    "prompt": "oak tree with boletus growing on its branches"
  },
  "output": [
    "https://replicate.com/api/models/stability-ai/stable-diffusion/files/9c3b6fe4-2d37-4571-a17a-83951b1cb120/out-0.png"
  ],
  "error": null,
  "logs": "Using seed: 36941...",
  "metrics": {
    "predict_time": 4.484541
  }
}

Get a list of predictions

GET https://api.replicate.com/v1/predictions

Get a paginated list of predictions that you've created with your account. This includes predictions created from the API and the Replicate website. Returns 100 records per page.

Example curl request:

$ curl -s \
  -H "Authorization: Token $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/predictions

The response is a JSON object in the following format:

{
  "previous": null, 
  "next": "https://api.replicate.com/v1/predictions?cursor=cD0yMDIyLTAxLTIxKzIzJTNBMTglM0EyNC41MzAzNTclMkIwMCUzQTAw",
  "results": [{}, {}, {}]
}

The results key is a list of prediction objects in the following format:

{
  "id": "jpzd7hm5gfcapbfyt4mqytarku",
  "version": "b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/jpzd7hm5gfcapbfyt4mqytarku",
    "cancel": "https://api.replicate.com/v1/predictions/jpzd7hm5gfcapbfyt4mqytarku/cancel"
  },
  "created_at": "2022-04-26T20:00:40.658234Z",
  "started_at": "2022-04-26T20:00:84.583803Z",
  "completed_at": "2022-04-26T20:02:27.648305Z",
  "source": "web",
  "status": "succeeded"
}
  • id: The unique ID of the prediction. Can be used to get a single prediction.
  • version: The unique ID of model version used to create the prediction.
  • urls: A convenience object that can be used to construct new API requests against the given prediction.
  • source: Indicates where the prediction was created. Possible values are `web` or `api`.
  • status: Status of the prediction. Refer to get a single prediction for possible values.

Cancel a prediction

POST https://api.replicate.com/v1/predictions/{prediction_id}/cancel

Get a model

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}

The request path takes these parameters:

  • model_owner: The name of the user or organization that owns the model.
  • model_name: The name of the model.

Example curl request:

$ curl -s \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/models/replicate/hello-world

The response is a model object in the following format:

{
  "url": "https://replicate.com/replicate/hello-world",
  "owner": "replicate",
  "name": "hello-world",
  "description": "A tiny model that says hello",
  "visibility": "public",
  "github_url": "https://github.com/replicate/cog-examples",
  "paper_url": null,
  "license_url": null,
  "latest_version": {...}
}

The latest_version is the model's most recently pushed version.

Get a list of model versions

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions

The request path takes these parameters:

  • model_owner: The name of the user or organization that owns the model.
  • model_name: The name of the model.

Example curl request:

$ curl -s \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/models/replicate/hello-world/versions

The response is a JSON array of Version objects, sorted with the most recent version first:

{
  "previous": null,
  "next": null,
  "results": [
    {
      "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
      "created_at": "2022-04-26T19:29:04.418669Z",
      "cog_version": "0.3.0",
      "openapi_schema": {...}
    },
    {
      "id": "e2e8c39e0f77177381177ba8c4025421ec2d7e7d3c389a9b3d364f8de560024f",
      "created_at": "2022-03-21T13:01:04.418669Z",
      "cog_version": "0.3.0",
      "openapi_schema": {...}
    }
  ]
}

Get a model version

GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{id}

The request path takes these parameters:

  • model_owner: The name of the user or organization that owns the model.
  • model_name: The name of the model.
  • id: The ID of the version.

Example curl request:

$ curl -s \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa

The response is the version object:

{
  "id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "created_at": "2022-04-26T19:29:04.418669Z",
  "cog_version": "0.3.0",
  "openapi_schema": {...}
}

Get a collection of models

GET https://api.replicate.com/v1/collections/{collection_slug}

The request path takes these parameters:

Example curl request:

$ curl -s \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/collections/super-resolution

The response is a collection object with a nested list of model objects within that collection:

{
  "name": "Super resolution",
  "slug": "super-resolution",
  "description": "Upscaling models that create high-quality images from low-quality images.",
  "models": [...]
}

Replicate