HTTP API reference
Contents
- Authentication
- Create a prediction
- Get a prediction
- List predictions
- Cancel a prediction
- Get a model
- Get a model version
- List model versions
- Delete a model version
- Create a training
- Get a training
- List trainings
- Cancel a training
- Get a collection of models
- List collections of models
- Rate limits
- OpenAPI schema
Authentication
All API requests must be authenticated with a token. Include this header with all requests:
Create a prediction
POST https://api.replicate.com/v1/predictions
Start a new prediction for the model version and inputs you provide.
Example request body:
{
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
}
}
Example cURL request:
$ curl -s -X POST \
-d '{"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \
-H "Authorization: Token <paste-your-token-here>" \
-H 'Content-Type: application/json' \
https://api.replicate.com/v1/predictions
The response will be the prediction object:
{
"id": "gm3qorzdhgbfurvjtvhg6dckhu",
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
},
"logs": "",
"error": null,
"status": "starting",
"created_at": "2023-09-08T16:19:34.765994657Z",
"urls": {
"cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel",
"get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu"
}
}
As models can take several seconds or more to run, the output will not be available immediately. To get the final result of the prediction you should either provide a webhook
HTTPS URL for us to call when the results are ready, or poll the get a prediction endpoint until it has finished.
Input and output (including any files) will be automatically deleted after an hour, so you must save a copy of any files in the output if you’d like to continue using them.
Request body
input
The model’s input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the “API” tab on the model you are running or get the model version and look at its openapi_schema
property. For example, stability-ai/sdxl takes prompt
as an input.
Files should be passed as data URLs or HTTPS URLs.
stream
Request a URL to receive streaming output using server-sent events (SSE).
If the requested model version supports streaming, the returned prediction will have a stream
entry in its urls
property with an HTTPS URL that you can use to construct an EventSource
.
version
The ID of the model version that you want to run.
webhook
An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once.
webhook_events_filter
By default, we will send requests to your webhook URL whenever there are new logs, new outputs, or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request:
start
: immediately on prediction startoutput
: each time a prediction generates an output (note that predictions can generate multiple outputs)logs
: each time log output is generated by a predictioncompleted
: when the prediction reaches a terminal state (succeeded/canceled/failed)
For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
{
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they’ll always be sent regardless of throttling.
Get a prediction
GET https://api.replicate.com/v1/predictions/{prediction_id}
Get the current state of a prediction.
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu
The response will be the prediction object:
{
"id": "gm3qorzdhgbfurvjtvhg6dckhu",
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
},
"logs": "",
"output": "hello Alice",
"error": null,
"status": "succeeded",
"created_at": "2023-09-08T16:19:34.765994Z",
"started_at": "2023-09-08T16:19:34.779176Z",
"completed_at": "2023-09-08T16:19:34.791859Z",
"metrics": {
"predict_time": 0.012683
},
"urls": {
"cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel",
"get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu"
}
}
status
will be one of:
starting
: the prediction is starting up. If this status lasts longer than a few seconds, then it’s typically because a new worker is being started to run the prediction.processing
: thepredict()
method of the model is currently running.succeeded
: the prediction completed successfully.failed
: the prediction encountered an error during processing.canceled
: the prediction was canceled by its creator.
In the case of success, output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You’ll need to pass the Authorization
header to request them.
In the case of failure, error
will contain the error encountered during the prediction.
Terminated predictions (with a status of succeeded
, failed
, or canceled
) will include a metrics
object with a predict_time
property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won’t include time waiting for the prediction to start.
Request path parameters
prediction_id
The ID of the prediction to get.
List predictions
GET https://api.replicate.com/v1/predictions
Get a paginated list of predictions that you’ve created. This will include predictions created from the API and the website. It will return 100 records per page.
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/predictions
The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:
{
"next": null,
"previous": null,
"results": [
{
"completed_at": "2023-09-08T16:19:34.791859Z",
"created_at": "2023-09-08T16:19:34.907244Z",
"error": null,
"id": "gm3qorzdhgbfurvjtvhg6dckhu",
"input": {
"text": "Alice"
},
"metrics": {
"predict_time": 0.012683
},
"output": "hello Alice",
"started_at": "2023-09-08T16:19:34.779176Z",
"source": "api",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
"cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
},
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
}
]
}
id
will be the unique ID of the prediction.
source
will indicate how the prediction was created. Possible values are web
or api
.
status
will be the status of the prediction. Refer to get a single prediction for possible values.
urls
will be a convenience object that can be used to construct new API requests for the given prediction.
version
will be the unique ID of model version used to create the prediction.
Cancel a prediction
POST https://api.replicate.com/v1/predictions/{prediction_id}/cancel
Request path parameters
prediction_id
The ID of the prediction to cancel.
Get a model
GET https://api.replicate.com/v1/models/{model_owner}/{model_name}
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/models/replicate/hello-world
The response will be a model object in the following format:
{
"url": "https://replicate.com/replicate/hello-world",
"owner": "replicate",
"name": "hello-world",
"description": "A tiny model that says hello",
"visibility": "public",
"github_url": "https://github.com/replicate/cog-examples",
"paper_url": null,
"license_url": null,
"run_count": 5681081,
"cover_image_url": "...",
"default_example": {...},
"latest_version": {...},
}
The cover_image_url
string is an HTTPS URL for an image file. This can be:
- An image uploaded by the model author.
- The output file of the example prediction, if the model author has not set a cover image.
- The input file of the example prediction, if the model author has not set a cover image and the example prediction has no output file.
- A generic fallback image.
The default_example
object is a prediction created with this model.
The latest_version
object is the model’s most recently pushed version.
Request path parameters
model_owner
The name of the user or organization that owns the model.
model_name
The name of the model.
Get a model version
GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa
The response will be the version object:
{
"id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"created_at": "2022-04-26T19:29:04.418669Z",
"cog_version": "0.3.0",
"openapi_schema": {...}
}
Every model describes its inputs and outputs with OpenAPI Schema Objects in the openapi_schema
property.
The openapi_schema.components.schemas.Input
property for the replicate/hello-world model looks like this:
{
"type": "object",
"title": "Input",
"required": [
"text"
],
"properties": {
"text": {
"x-order": 0,
"type": "string",
"title": "Text",
"description": "Text to prefix with 'hello '"
}
}
}
The openapi_schema.components.schemas.Output
property for the replicate/hello-world model looks like this:
{
"type": "string",
"title": "Output"
}
For more details, see the docs on Cog’s supported input and output types
Request path parameters
model_owner
The name of the user or organization that owns the model.
model_name
The name of the model.
version_id
The ID of the version.
List model versions
GET https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/models/replicate/hello-world/versions
The response will be a JSON array of model version objects, sorted with the most recent version first:
{
"next": null,
"previous": null,
"results": [
{
"id": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"created_at": "2022-04-26T19:29:04.418669Z",
"cog_version": "0.3.0",
"openapi_schema": {...}
}
]
}
Request path parameters
model_owner
The name of the user or organization that owns the model.
model_name
The name of the model.
Delete a model version
DELETE https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}
Delete a model version and all associated predictions, including all output files.
Model version deletion has some restrictions:
- You can only delete versions from models you own.
- You can only delete versions from private models.
- You cannot delete a version if someone other than you has run predictions with it.
Example cURL request:
$ curl -s -X DELETE \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/models/replicate/hello-world/versions/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa
The response will be an empty 202, indicating the deletion request has been accepted. It might take a few minutes to be processed.
Request path parameters
model_owner
The name of the user or organization that owns the model.
model_name
The name of the model.
version_id
The ID of the version.
Create a training
POST https://api.replicate.com/v1/models/{model_owner}/{model_name}/versions/{version_id}/trainings
Start a new training of the model version you specify.
Example request body:
{
"destination": "{new_owner}/{new_name}",
"input": {
"train_data": "https://example.com/my-input-images.zip",
},
"webhook": "https://example.com/my-webhook",
}
Example cURL request:
$ curl -s -X POST \
-d '{"destination": "{new_owner}/{new_name}", "input": {"input_images": "https://example.com/my-input-images.zip"}}' \
-H "Authorization: Token <paste-your-token-here>" \
-H 'Content-Type: application/json' \
https://api.replicate.com/v1/models/stability-ai/sdxl/versions/da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf/trainings
The response will be the training object:
{
"id": "zz4ibbonubfz7carwiefibzgga",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"input": {
"input_images": "https://example.com/my-input-images.zip"
},
"logs": "",
"error": null,
"status": "starting",
"created_at": "2023-09-08T16:32:56.990893084Z",
"urls": {
"cancel": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga/cancel",
"get": "https://api.replicate.com/v1/predictions/zz4ibbonubfz7carwiefibzgga"
}
}
As models can take several minutes or more to train, the result will not be available immediately. To get the final result of the training you should either provide a webhook
HTTPS URL for us to call when the results are ready, or poll the get a training endpoint until it has finished.
When a training completes, it creates a new version of the model at the specified destination.
To find some models to train on, check out the trainable language models collection.
Request path parameters
model_owner
The name of the user or organization that owns the model.
model_name
The name of the model.
version_id
The ID of the version.
Request body
destination
A string representing the desired model to push to in the format {destination_model_owner}/{destination_model_name}
. This should be an existing model owned by the user or organization making the API request. If the destination is invalid, the server will return an appropriate 4XX response.
input
An object containing inputs to the Cog model’s train()
function.
webhook
An HTTPS URL for receiving a webhook when the training completes. The webhook will be a POST request where the request body is the same as the response body of the get training operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once.
Get a training
GET https://api.replicate.com/v1/trainings/{training_id}
Get the current state of a training.
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga
The response will be the training object:
{
"completed_at": "2023-09-08T16:41:19.826523Z",
"created_at": "2023-09-08T16:32:57.018467Z",
"error": null,
"id": "zz4ibbonubfz7carwiefibzgga",
"input": {
"input_images": "https://example.com/my-input-images.zip"
},
"logs": "...",
"metrics": {
"predict_time": 502.713876
},
"output": {
"version": "...",
"weights": "..."
},
"started_at": "2023-09-08T16:32:57.112647Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga",
"cancel": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga/cancel"
},
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
}
status
will be one of:
starting
: the training is starting up. If this status lasts longer than a few seconds, then it’s typically because a new worker is being started to run the training.processing
: thetrain()
method of the model is currently running.succeeded
: the training completed successfully.failed
: the training encountered an error during processing.canceled
: the training was canceled by its creator.
In the case of success, output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You’ll need to pass the Authorization
header to request them.
In the case of failure, error
will contain the error encountered during the training.
Terminated trainings (with a status of succeeded
, failed
, or canceled
) will include a metrics
object with a predict_time
property showing the amount of CPU or GPU time, in seconds, that the training used while running. It won’t include time waiting for the training to start.
Request path parameters
training_id
The ID of the training to get.
List trainings
GET https://api.replicate.com/v1/trainings
Get a paginated list of trainings that you’ve created. This will include trainings created from the API and the website. It will return 100 records per page.
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/trainings
The response will be a paginated JSON array of training objects, sorted with the most recent training first:
{
"next": null,
"previous": null,
"results": [
{
"completed_at": "2023-09-08T16:41:19.826523Z",
"created_at": "2023-09-08T16:32:57.018467Z",
"error": null,
"id": "zz4ibbonubfz7carwiefibzgga",
"input": {
"input_images": "https://example.com/my-input-images.zip"
},
"metrics": {
"predict_time": 502.713876
},
"output": {
"version": "...",
"weights": "..."
},
"started_at": "2023-09-08T16:32:57.112647Z",
"source": "api",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga",
"cancel": "https://api.replicate.com/v1/trainings/zz4ibbonubfz7carwiefibzgga/cancel"
},
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
}
]
}
id
will be the unique ID of the training.
source
will indicate how the training was created. Possible values are web
or api
.
status
will be the status of the training. Refer to get a single training for possible values.
urls
will be a convenience object that can be used to construct new API requests for the given training.
version
will be the unique ID of model version used to create the training.
Cancel a training
POST https://api.replicate.com/v1/trainings/{training_id}/cancel
Request path parameters
training_id
The ID of the training you want to cancel.
Get a collection of models
GET https://api.replicate.com/v1/collections/{collection_slug}
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/collections/super-resolution
The response will be a collection object with a nested list of the models in that collection:
{
"name": "Super resolution",
"slug": "super-resolution",
"description": "Upscaling models that create high-quality images from low-quality images.",
"models": [...]
}
Request path parameters
collection_slug
The slug of the collection, like super-resolution
or image-restoration
. See replicate.com/collections.
List collections of models
GET https://api.replicate.com/v1/collections
Example cURL request:
$ curl -s \
-H "Authorization: Token <paste-your-token-here>" \
https://api.replicate.com/v1/collections
The response will be a paginated JSON list of collection objects:
{
"next": "null",
"previous": null,
"results": [
{
"name": "Super resolution",
"slug": "super-resolution",
"description": "Upscaling models that create high-quality images from low-quality images."
}
]
}
Rate limits
We limit the number of API requests that can be made to Replicate:
- You can call create prediction at 10 requests per second (average) with a burst capacity of up to 600 requests.
- All other endpoints you can call at 50 requests per second (average) with a burst capacity of up to 3000 requests.
Whenever your request rate is below your limit, you will accumulate burst capacity up to the stated maximum. For example, if your rate limit is 2 requests per second with a burst capacity of up to 10 requests, then any of the following scenarios would be allowed:
- send 10 requests all at once; wait 5 seconds; send 10 requests all at once
- send 10 requests all at once; send 1 request per second for the next 10 seconds; send 10 requests all at once
- send 2 requests per second continuously
If you hit a limit, you will receive a response with status 429
with a body like:
{"detail":"Request was throttled. Expected available in 1 second."}
The numbers you see above may already reflect custom limits we have set for you. If you want higher limits, email us at team@replicate.com.
OpenAPI schema
OpenAPI (formerly known as Swagger) is a specification that provides a standard way to describe the structure of an HTTP API, including the available endpoints, their HTTP methods, expected request and response formats, and other metadata.
Download Replicate's OpenAPI schema at api.replicate.com/openapi.json.