A collection of updates to Replicate for the week ending November 22, 2024.
Playground
Web
Platform
Docs
A collection of updates to Replicate for the week ending November 8, 2024.
Python 3.7 model support ends on November 15th.
Replicate will stop supporting models built with Python 3.7 at 12:00 PM UTC (Coordinated Universal Time) on Friday, November 15th.
Python 3.7 was declared end-of-life in late June 2023. Since then, we've continued running models built with it to give model maintainers time to migrate to newer Python versions.
Although we would like to continue running these models indefinitely, we must prioritize the security needs of our customers. After Python 3.7 support ends, requests to affected models will fail with a 400-level error with an explanation.
Please update your models to a newer Python version before this date to avoid any interruptions, and let us know if you have questions or need help updating your models.
A collection of updates to Replicate for the week ending October 25, 2024.
Finally. You can switch it in the footer.
Finally. You can switch it in the footer.
Playground is a way to quickly try out and compare the output of models on Replicate.
Playground is a way to quickly try out and compare the output of models on Replicate.
You can:
Playground is currently in beta, and works with any FLUX and related fine-tunes. Try it out at replicate.com/playground.
We'd love to hear what you think. Send your feedback to playground@replicate.com.
Documentation now has a cleaner design, better navigation, and lots of new content to help you build with Replicate.
Documentation now has a cleaner design, better navigation, and lots of new content to help you build with Replicate.
It’s organized into four sections:
We've added dozens of new code samples, detailed guides for common use cases, and an expanded API reference.
Take a look at replicate.com/docs.
Our client libraries and API are now much faster at running models, particularly if a file is being returned.
Our client libraries and API are now much faster at running models, particularly if a file is being returned.
The API now just returns the response immediately. Before, you would have to poll to get the result.
If you're using the Node.js or Python client libraries, you don't have to worry about this. Just upgrade to the latest version and it gets faster. Also, instead of returning HTTP URLs, they now return file objects, which makes it much easier to write them to storage or pass to HTTP responses.
Node.js
Install the beta version of the client library:
Then run the model:
Python
Update the client library:
Then run the model:
The client libraries returning file objects is a breaking change so be careful when you upgrade your apps.
HTTP API
If you're using the HTTP API for models or deployments, you can now pass the header Prefer: wait
, which will keep the connection open until the prediction has finished:
Output:
By default it will wait 60 seconds before returning the in-progress prediction. You can adjust that by passing a time, like Prefer: wait=10
to wait 10 seconds.
Take a look at the docs on creating a prediction for more details.
Large logs may be truncated to manage platform load.
Predictions or trainings that log particularly large volumes of information may now have their logs truncated. This helps us manage load on our platform and is a restriction we may lift in future.
For now, you'll always get the first few lines of your logs -- which might contain important information about inputs, seed values, etc. -- and the most recent logs.
Cleaning old predictions to improve API performance and speed.
To improve API performance and speed up database queries, we are cleaning out old predictions.
From now on, you'll only be able to share predictions made in the web playground for up to 60 days after they're created.
If you've already shared a prediction, don't worry - it will still be accessible to others even after 60 days.
The training detail page now includes a JSON metadata tab.
The training detail page on the website now has a JSON tab, so you can see all the metadata and outputs from your trainings, including the URL of your generated weights file.
To view all your trainings, go to replicate.com/trainings
Programmatically search for public models using the API
Replicate’s API now has an endpoint for searching public models.
To use it, make a QUERY HTTP request using your search query as the plaintext body of the request:
The response will be a paginated JSON object containing an array of model objects:
For more details, check out the HTTP API reference docs.
You can also the download metadata for all public models on Replicate using the list public models API if you need more fine-grained control or want to build your own implementation of model search.
You no longer need to pass `"stream": true` with your prediction request to get back a stream URL - if the model supports streaming, you'll always get one back.
We've made streaming a tiny bit easier and simplified our API.
When we first launched streaming for language models you had to pass "stream": true
with your request to enable it. Adoption of streaming has been great, with about one third of requests to compatible models requesting a stream, and our streaming infrastructure is more robust now than it was when we first launched it. This means we're now in the position to just stream everything by default if the model supports it.
So you no longer need to pass "stream": true
with your prediction request to get back a stream URL – if the model supports streaming, you'll always get one back.
We've deprecated the field, but it's still valid to pass it. If you do continue to pass "stream": true
then if the model supports streaming it has no effect. And if the model doesn't support streaming we'll keep the current behaviour of responding with a 422.
Securely pass sensitive values to models.
We've added a way to securely pass sensitive values to models.
In Cog v0.9.7 and later,
you can annotate an input with the Secret
type
to signify that an input holds sensitive information,
like a password or API token.
Replicate treats secret inputs differently throughout its system.
When you create a prediction on Replicate,
any value passed to a Secret
input is redacted after being sent to the model.
Before
After
Caution: Passing secret values to untrusted models can result in unintended disclosure, exfiltration, or misuse of sensitive data.
Disable API tokens from the web to prevent unauthorized use.
You can now disable API tokens from the web. This is useful if you accidentally leaked your token and want to prevent unauthorized use of the token.
To view and manage your tokens, go to replicate.com/account/api-tokens.
We publish automated feeds about product updates and platform incidents.
We now publish RSS and Atom feeds for our blog, changelog, and platform status site. You can subscribe to these feeds to get updates about new product features, as well as platform incidents
If you're using Slack, you can use Slack's RSS app to subscribe to these feeds and get notifications right in your team's Slack workspace.
Delete models, versions, and deployments via web or HTTP API.
You can now delete models, versions, and deployments using the web or the HTTP API.
Deleting models
You can delete a model directly from the web on the model settings page. You can also delete a model programmatically using the models.delete
HTTP API.
There are some restrictions on which models you can delete:
Deleting model versions
You can delete a model version directly from the web using the nav in the header of the model versions page. You can also delete a model version programmatically using the versions.delete
HTTP API.
The following restrictions apply to deleting versions:
Deleting deployments
You can delete a deployment directly from the web on the deployment settings page. You can also delete a deployment programmatically using the deployments.delete
HTTP API.
The following restriction applies to deleting deployments:
Validate incoming webhooks so you know they're really coming from Replicate.
A webhook is an HTTP POST from an unknown source. Attackers can impersonate services by simply sending a fake webhook to an endpoint. Replicate protects you from this attack by signing every webhook and its metadata with a unique key for each user or organization. You can use this signature to verify that incoming webhooks are coming from Replicate before you process them.
Today we improved our documentation and client library support for webhooks, so you can securely verify webhooks in your existing application without having to write your own validation logic:
You can now search for your existing deployments on the website.
The Replicate website has a search bar at the top of every screen that lets you fuzzy search for models, collections, and pages.
We've now added deployments to the search results, so you can quickly jump right to one of your deployments with fewer clicks.
Press cmd-k
(or ctrl-k
on Windows) to open the search and search for your deployment by name:
We've added new UI features for viewing and tracking webhooks.
We've added two new UI features to make your experience with webhooks better on Replicate.
On the prediction detail screen, if your prediction was made with webhooks, you'll see an indicator that opens a sliding panel with the details and status of those webhooks.
There's also a new "Webhooks" tab on the Dashboard where you'll find a chart detailing 6 hours of recent webhook activity, as well as any failures that have occurred in that time period.
The API now validates JSON request body fields.
If you typo a field in the JSON request body when creating a prediction with the API, we will now tell you about it rather than silently accepting it.
Here's an example showing the response you'll get if you misspell input
as inptu
:
Note that this validation only applies to top-level properties in the payload like input
, version
, stream
, webhook
, webhook_events_filter
, etc.
T4 models now get up to 16 GB of RAM, with no changes to existing prices.
Until recently, models running on T4 GPUs would get 4 CPUs, 8 GB of RAM and 16 GB of GPU RAM. We got feedback from model authors that this caused issues with loading more than 8GB of weights into the GPU - it's possible, but it requires writing more intricate code to stream weights from disk to GPU RAM without loading the whole weights into main RAM at once.
Starting today, T4 models can now use up to 16 GB of RAM. This should make it easier to make use of all of the GPU RAM on T4 GPUs. There is no pricing change for this upgrade.
You can see the resources available for each hardware type on our pricing page.
Store API tokens in config, not code, for security.
When working with secrets like API tokens, you should always store them in your app configuration (like environment variables) rather than directly in your codebase. Keeping sensitive information separate from your source code prevents accidental exposure.
This is a common and well-known security practice, but sometimes you slip up and accidentally include a secret in your codebase. That's where GitHub security scanning comes in.
We've partnered with GitHub and are now members of their Secret scanning partner program, helping to keep your account safe. Whenever GitHub finds a Replicate API token in a public repository, they notify us so we can take proper steps to protect your account from unwanted use.
Bearer token used in HTTP Authorization header for access.
When using an HTTP API, a Bearer token is like a password that you specify in the Authorization
header of your HTTP requests to access protected resources.
We’ve updated our API to support a standard Bearer
authorization scheme rather than a custom/non-standard Token scheme. We've also updated our API docs and OpenAPI specification to use the new format:
The old Authorization: Token r8...
format is still supported, so your existing API integrations will continue to work without any code changes.
Manage models with more control using our new deployments feature.
Deployments give you more control over how your models run. You can scale them up and down based on demand, customize their hardware, and monitor performance and predictions without editing your code.
Managing deployments was previously only possible on the web, but you can now also create, read, and update deployments using Replicate’s HTTP API.
Here's an example API request that updates the number of min and max instances for an existing deployment:
Check out the API docs for deployments:
If you're new to deployments, check out the getting started guide.
We've added a web-based viewer for 3D model outputs.
Models like adirik/imagedream that output GLB files now have a web-based viewer, so you can explore the 3D output right in your browser.
The GLB format encapsulates textures, models, and animations into a single file which can be easily shared and used across platforms and devices. This makes it useful in web development, augmented reality projects, and game development.
If you're working with AI models that produce PLY outputs, you can use a model like camenduru/lgm-ply-to-glb to convert PLY files to GLB.
There's also a collection of models for making 3D stuff.
We've updated our model collections to be more task-oriented.
We've updated our curated model collections to be more oriented around tasks like upscaling images, generating embeddings, or getting structured data from language models. This should make it easier to find the right models for the problems you're trying to solve.
Each collection now also includes a more detailed summary of the kinds of tasks you can perform with models in that collection. For example, vision models can be used for all sorts of tasks like captioning images, answering questions about images, or detecting objects:
Use webhooks to receive real-time updates about your predictions and trainings.
Webhooks provide real-time updates about your predictions. You can specify an endpoint when you create a prediction, and Replicate will send HTTP POST requests to that URL when the prediction is created, updated, and finished.
Starting today, we now sign every webhook and its metadata with a unique key for each user or organization. This allows you to confirm the authenticity of incoming webhooks and protect against unauthorized access and replay attacks.
Learn how to verify incoming webhooks on our webhooks guide
Create custom SDXL image generation models without coding.
In August 2023, we added an API for fine-tuning SDXL, Stability AI's most powerful open-source image generation model. We made it easy to create your own custom models with just a handful of training images and a few lines of code, then generate new images in your own style. In the last few months, people have created lots of interesting SDXL fine-tunes.
You can now create SDXL fine-tunes from the web. Choose a name for your model, drag and drop a zip file of images, and kick off the training process, without writing any code.
To get started, visit replicate.com/stability-ai/sdxl/train
New slider for comparing image inputs to outputs.
We've added a new slider component that makes it easier to compare image inputs to outputs. Drag the slider to the left and right to compare the "before and after" images.
This feature works for all models that take an input image and produce an output image, like image restoration models and upscalers.
Code snippets for various languages now available on models.
Every prediction on the web now includes code snippets for Node.js, Python, and other languages. This makes it easy to tinker with a model on the web, then grab some useable code and run with it.
To view these code snippets, click the language tabs under the Input heading on any model or prediction page:
Clicking one of these language tabs changes the page URL, which means you can also deep link to language-specific snippets for any model:
Here's a five-minute video showing off how to use this new feature to run a model in the browser, then grab the code to run the model in your own project using Replicate's API:
You can now create models programmatically using the API.
Replicate's API now has an endpoint for creating models.
You can use it to automate the creation of models, including fine-tunes of SDXL and Llama 2.
Here's an example that uses cURL to create a model with a given owner, name, visibility, and hardware:
The response is a JSON object of the created model:
To see all the hardware available for your model to run, consult our endpoint for listing hardware.
To compare the price and specifications of these hardware types, check out the pricing page.
We've added this new operation to the Replicate JavaScript client:
Then:
We've added this new operation to the Replicate Python client:
Then:
We've added this new operation to the Replicate Elixir client:
Then:
Check out the HTTP API reference for more detailed documentation about this new endpoint.
Improved training pages for fine-tuning models.
When you kick off a training process to fine-tune your own model, there's a page you can visit to view the status of the training, as well as the inputs and outputs. We've made some recent improvements to those pages:
View prediction parameters as JSON on prediction detail page
You can now view prediction parameters as JSON from the prediction detail page.
This improves the workflow for experimenting in the web interface and then transitioning to making predictions from the API, using code.
New API endpoint for listing public models.
Replicate's API now has an endpoint for listing public models.
You can use it to discover newly published models, and to build your own tools for exploring Replicate's ecosystem of thousands of open-source models.
Here's an example that uses cURL and jq to fetch the URLs of the 25 most recently updated public models:
The response is a paginated JSON array of model objects:
We've added this new operation to the Replicate JavaScript client:
Then:
We've added this new operation to the Replicate Python client:
Then:
If you’re deploying your own public models to Replicate and want others to be able to discover them, make sure they meet the following criteria:
Create deployments for controlled model running and customization.
You can now create a deployment to get more control over how your models run. Deployments allow you to run a model with a private, fixed API endpoint. You can configure the version of the model, the hardware it runs on, and how it scales.
Using deployments, you can:
Deployments work with both public models and your own private models.
🚀 Check out the deployments guide to learn more and get started.
Prediction UUID added as query parameter on web refresh.
When you create a prediction on the web, we now append a query parameter ?prediction=<uuid>
, so that if you refresh the page, you see that prediction instead of the form prefilled with the default inputs. Previously, if you created a prediction on the web and refreshed, you'd lose the prediction and have to go spelunking for it on your dashboard.
View training logs in full-screen mode.
You can now expand your training logs and view them full-screen.
Browser tab favicon shows prediction status.
✅ We've added a new feature to show the prediction status in the favicon of the browser tab. This makes it easier to know when your running predictions have completed without having to switch tabs.
Our API now supports live language model output with SSE streams.
Replicate's API now supports server-sent event (SSE) streams for language models, giving you live output as the model is running. See the announcement blog post and our streaming guide for more details about how to use streaming output.
Create multiple personal API tokens for your user account.
You can now create multiple personal API tokens at https://replicate.com/account/api-tokens
They're just like organization tokens, but they're only for your personal user account. You can name your tokens to make them distinguishable, and reset them if needed.
NVIDIA A40 GPUs now supported.
You can now run Replicate models on NVIDIA A40 GPUs. In terms of price and performance, the A40 sits between our T4 and A100 hardware. For many models, the A40 Large can be 80-90% as fast as A100s but almost half the price.
The A40 GPU is currently available in two configurations, each with the same GPU but attached to a machine with different amounts of CPU and RAM:
Hardware | Price | GPU | CPU | GPU RAM | RAM |
---|---|---|---|---|---|
Nvidia A40 GPU | $0.0013 per second ($0.078 per minute) | 1x | 4x | 48GB | 16GB |
Nvidia A40 (Large) GPU | $0.0016 per second ($0.096 per minute) | 1x | 10x | 48GB | 72GB |
To choose which GPU type is used to run your model, see the Hardware dropdown on your model's settings page:
To compare price and performance of all available GPUs, see our pricing page.
Model pages now show training hardware type and cost.
Trainable model pages now have a section that indicates training hardware type and cost.
See https://replicate.com/a16z-infra/llama7b-v2-chat#training
We've launched a fine-tuning API for training custom language models.
We built a training API for fine-tuning language models, and today we're making it available to all Replicate users.
You can fine-tune language models to make them better at a particular task, like classifying text, answering questions about your private data, being a chatbot, or extracting structured data from text.
You can train models like LLaMA 2, Flan T5, GPT-J and others. Check out the trainable language models collection to see what models can be fine-tuned, and stay tuned as we add support for more.
To get started, check out our guide to fine tuning a language model.
See Git commits and tags in model versions on Replicate.
You can now see the Git commit and tag used to create new versions of a model. Run brew upgrade cog
to upgrade to the latest release.
The next time you run cog push
, the current Git commit and tag for your model will automatically be included in the resulting Docker image, and it'll show up on Replicate under the model page's Versions tab.
Website download mechanism improved for multiple and single outputs.
We've made a few improvements to the download mechanism on the website:
Happy downloading!
View detailed invoice summaries with cost and model breakdowns.
You can now view a detailed summary of your invoices with a breakdown of cost, prediction time, and hardware for each model you ran.
Check out your billing settings at https://replicate.com/account/billing
We've published a Swift client library for AI-powered app development.
Replicate now has a client library for Swift. It’s got everything you need to build an AI-powered app for iOS and macOS using Replicate's HTTP API.
Add it as a package dependency to your project:
Then, you can run predictions:
Follow our guide to building a SwiftUI app or read the full documentation on GitHub.
Collaborate with your team on Replicate using organizations.
You can now use an organization to collaborate with other people on Replicate.
Organizations let you share access to models, API tokens, billing, dashboards, and more. When you run models as the organization, it gets billed to your shared credit card instead of your personal account.
To get started, use the new account menu to create your organization:
We're released a Node.js client library for model predictions.
Replicate now has a client library for Node.js. You can use it to run models and everything else you can do with the HTTP API.
Install it from npm:
Then, you can run predictions:
Follow our guide to running a model from Node.js or read the full documentation on GitHub.
The "get a model" API now returns more metadata.
The "get a model" API operation now returns more metadata about the model:
run_count
: an integer indicating how many times the model has been run.default_example
: a prediction object created with this model, and selected by the model owner as an example of the model's inputs and outputs.cover_image_url
: an HTTPS URL string for an image file. This is an image uploaded by the model author, or an output file or input file from the model's default example prediction.Here's an example using cURL and jq to get the Salesforce blip-2 model as a JSON object and pluck out some of its properties:
Here's what the response looks like:
See the "get a model" API docs for more details.
All models have OpenAPI schema objects for their inputs and outputs.
Every model on Replicate describes its inputs and outputs with OpenAPI Schema Objects in the openapi_schema
property. This is a structured JSON object that includes the name, description, type, and allowed values for each input or output parameter.
Today we've improved our API reference documentation to clarify how to get a model's input and output schema.
See the updated docs at https://replicate.com/docs/reference/http#models.versions.get
Here's an example of how to get the input schema for Stable Diffusion using cURL:
Using this command, we can see all the inputs to Stable Diffusion, including their types, description, min and max values, etc:
And here's a command to the get the output schema:
From this command, we can see that Stable Diffusion output format is a list of URL strings:
HTTP API reference is now also available as a structured OpenAPI JSON schema.
OpenAPI (formerly known as Swagger) is a specification for describing HTTP APIs with structured data, including the available endpoints, their HTTP methods, expected request and response formats, and other metadata.
You can use this schema as a reference document, or to auto-generate your own API client code, or to mock and test your existing use of Replicate's API.
All of the documentation you see on the HTTP API reference page is also available as a structured OpenAPI JSON schema.
Download the schema at api.replicate.com/openapi.json
Browse models on our new Explore page.
You can now browse through all the models on Replicate. Check them out on the Explore page!
API sends webhook events at different prediction lifecycle stages.
When you create a prediction with the API, you can provide a webhook URL for us to call when your prediction is complete.
Starting today, we now send more webhook events at different stages of the prediction lifecycle. We send requests to your webhook URL whenever there are new logs, new outputs, or the prediction has finished.
You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request.
start
: Emitted immediately on prediction start. This event is always sent.output
: Emitted each time a prediction generates an output (note that predictions can generate multiple outputs).logs
: Emitted each time log output is generated by a prediction.completed
: Emitted when the prediction reaches a terminal state (succeeded
/canceled
/failed
). This event is always sent.For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
Requests for event types output
and logs
will be sent at most once every 500ms.
Requests for event types start
and completed
will always be sent.
If you're using the old webhook_completed
, you'll still get the same webhooks as before, but we recommend updating to use the new webhook
and webhook_completed
properties.
Docs: https://replicate.com/docs/reference/http#create-prediction--webhook
API now provides complete Python example code snippets.
We've made it even easier to start building with Replicate's API. When you click on the API tab for a model, the Python example code now has everything you need to run a prediction, including code for all of the model's inputs and outputs. These new code snippets include documentation and defaults for each input, so you can focus on coding, with less context switching between the API docs and your editor.
To learn more about how to get started, check out our "Run a model from Python" guide.
Cancel predictions even after navigating away from the page.
Have you ever kicked off a prediction, then, after thinking about it, realized you got one of the settings wrong or wanted to tweak the prompt? Well, now you can cancel that prediction, even if you've navigated away from the page you were on or made it through the API. On the website, go to your dashboard, find the running prediction, and you'll now see it live-updating with a handy "Cancel" button.
Install Cog on macOS with Homebrew using brew command.
🍏 Hey macOS users! There's now a Homebrew formula for Cog. Use brew install cog
to install it, and brew upgrade cog
to upgrade to the latest version. See https://github.com/replicate/cog#install
DreamBooth API now supports img2img with prompt and image input.
We've added img2img support to models created with our DreamBooth API.
This means you can optionally send both a prompt
and initial image
to generate new images (in addition to the other parameters specified in your DreamBooth model's API page).
Input:
prompt
: photo of zeke playing guitar on stage at concert
image
: https://www.pexels.com/photo/man-playing-red-and-black-electric-guitar-on-stage-167382/
Output:
To get started building and pushing your own DreamBooth model, check out the blog post.
Delete prediction button available on prediction detail page
You can now manually delete a prediction on the website. You'll find a "Delete" button on the prediction detail page, e.g. https://replicate.com/p/{prediction_id}
. Clicking this link will completely remove the prediction from the site, including any output data and output files associated with it.
API prediction data is automatically removed after one hour.
By popular request, we no longer store data for predictions made using the API.
User data is automatically removed from predictions an hour after they finish. The prediction itself is not deleted, but the input
and output
data for the prediction is removed. This only applies to predictions created with the API, but not predictions created on the website.
This is enabled for new accounts starting today, but we know that some users may be relying on prediction data to exist for more than an hour, so we've not enabled this for any existing accounts.
We are now publishing a dedicated changelog for product updates.
We now have a changelog for product updates at replicate.com/changelog. We used to use a single tweet thread as our makeshift changelog, but decided it was time to make something a bit more flexible. Stay tuned for more frequent updates here!
Stable Diffusion now has release notes for version changes.
Stable Diffusion now has release notes so you can see what's changed: replicate.com/stability-ai/stable-diffusion/versions. It only works on Stable Diffusion at the moment. Coming soon to all models so you can set release notes on your own models and see what has changed on other people's models.
Increased default rate limits for predictions up to 600 per second.
We've also increased our default rate limits. You can create 10 predictions a second, bursting up to 600 predictions a second. https://replicate.com/docs/reference/http#rate-limits
We can support higher rates too – just email us: team@replicate.com
Infrastructure improvements increase reliability and speed.
Over the past few weeks, we've made some major improvements to our infrastructure to make it more reliable and perform better. Nothing's changed from your point of view, but you'll be seeing faster response times! 🚀
Run models on Nvidia A100s via hardware upgrade option
You can now run your own models on Nvidia A100s. Click the settings tab on your model and select the hardware option to upgrade. 🚀
Set monthly spend limit to avoid surprise bills easily
You can now set a monthly spend limit on your account to avoid getting a surprising bill. 🦆
To set a limit, visit https://replicate.com/account#limits
API now supports webhooks for prediction completion notifications
🪝 Our API now supports webhooks, as an alternative to polling. Specify your webhook URL when creating a prediction and we'll POST to that URL when your prediction has completed! See the API docs here: https://replicate.com/docs/reference/http#create-prediction
Curated collections of models for similar tasks, starting with style transfer.
"We've started curating collections of models that perform similar tasks. First up is an assortment of ✨style transfer✨ models that take a content image and a style reference to produce a new image, like this starry night cat. https://replicate.com/collections/style-transfer
Scrub through model predictions to see how they evolved.
"When you run a model that changes over time, you can scrub back and forth to see the previous output. We've now added that scrubber to predictions in the example gallery, so you can see how they morphed into being: https://replicate.com/pixray/text2image/examples