Synchronous API

Our client libraries and API are now much faster at running models, particularly if a file is being returned.

The API now just returns the response immediately. Before, you would have to poll to get the result.

If you’re using the Node.js or Python client libraries, you don’t have to worry about this. Just upgrade to the latest version and it gets faster. Also, instead of returning HTTP URLs, they now return file objects, which makes it much easier to write them to storage or pass to HTTP responses.

Node.js

Install the beta version of the client library:

npm install replicate@latest

Then run the model:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate();
const [output] = await replicate.run("black-forest-labs/flux-schnell", { input: { 
    prompt: "astronaut riding a rocket like a horse"
}});

// It now returns a file object
fs.writeFileSync("my-image.webp", output);

// Or, you can still get an HTTP URL
console.log(output.url())

Python

Update the client library:

pip install --upgrade replicate

Then run the model:

[output] = replicate.run(
  "black-forest-labs/flux-schnell",
  input={"prompt": "astronaut riding a rocket like a horse"}
);

with open('output.webp', 'wb') as file:
    file.write(output.read())

print(output.url)

The client libraries returning file objects is a breaking change so be careful when you upgrade your apps.

HTTP API

If you’re using the HTTP API for models or deployments, you can now pass the header Prefer: wait, which will keep the connection open until the prediction has finished:

curl -X POST <https://api.replicate.com/v1/models/black-forest-labs/flux-schnell/predictions> \\
     -H "Authorization: Bearer $REPLICATE_API_TOKEN" \\
     -H "Prefer: wait=10" \\
     -d '{"input": {"prompt": "a cat riding a narwhal with rainbows"}}'

Output:

{
  "id": "dapztkbwgxrg20cfgsmrz2gm38",
  "status": "processing",
  "output": ["https://..."],
}

By default it will wait 60 seconds before returning the in-progress prediction. You can adjust that by passing a time, like Prefer: wait=10 to wait 10 seconds.

Take a look at the docs on creating a prediction for more details.