Announcing Replicate's remote MCP server
Last month we quietly published a local MCP server for Replicate’s HTTP API.
Today we’re announcing a hosted remote MCP server that you can use with apps like Claude Desktop, Claude Code, Cursor, and VS Code, giving you the power to explore and run all of Replicate’s HTTP APIs from a familiar chat-based natural language interface.
To get started, head over to 👉 mcp.replicate.com 👈
What is MCP?
MCP stands for Model Context Protocol. It’s a standard developed at Anthropic for giving language models access to external tools. This is commonly called “tool use” or “function calling”. This makes language models way more powerful, as they can now access external tools and data sources, instead of just their own internal knowledge.
Once you’ve installed the server, you can ask questions in Claude or Cursor like:
Find models:
“Find popular video models on Replicate that allow a starting frame as input”
Compare models:
“What are the differences between veo 3 and veo 3 fast on Replicate?”
Run models:
“Make a video of ‘a tortoise and a hare running in the Olympic 100m’ using veo 3 fast”
Here’s a video introducing MCP and showing how to use Replicate’s MCP server with Claude Desktop:
Two flavors: Remote and local
Our official MCP server is available as a hosted service, as well as a public npm package that you can run locally.
- Remote MCP server (recommended): This is the easiest option, and recommended for most users. You just add the hosted server URL to your apps like Claude or Cursor. After installing the server, you’ll be directed to a web-based authentication flow where you can provide a Replicate API key for the server to use on your behalf. To get started, go to mcp.replicate.com
- Local MCP server: You can run the server locally on your machine. All that’s required is a recent version of Node.js. To install and run the server locally, check out the docs.
JSON response filtering with jq
Some HTTP APIs can return very large JSON responses, and it’s easy to fill up a model’s context window with too much data. For example, Replicate’s search API returns paginated lists of models with extensive metadata for each model like inputs, outputs, description, and more. This metadata is useful, but can also be too large for the context windows of most large language models.
To work around this, we collaborated with the team at Stainless, who are building and maintaining our new API SDKs. They implemented tooling that dynamically filters larger response objects down to the most relevant parts. It uses a WebAssembly implementation of the popular jq
command line tool to write one-off filter expressions that are specific to the given response object schema and the task at hand.
This approach allows the language model to decide which parts of the response are most relevant to the task at hand, and only return those parts. Here’s an example, plucking out just the name, owner, description, and run count from the search response:
{
"body": "upscaler",
"jq_filter": ".results[] | {name: .name, owner: .owner, description: .description}"
}
Check out this video to see the response filtering in action:
Secure authentication with Cloudflare Workers
Our new hosted MCP server runs on Cloudflare Workers, which makes it easy to deploy and scale. Cloudflare is leading the industry with great tooling, tutorials, and content about how to securely build and scale MCP servers.
We use Cloudflare’s OAuth Provider Framework for Workers to keep your Replicate API token secure. When you connect your AI tools to Replicate, you visit a web page where you enter your Replicate API token. This token gets stored in Cloudflare’s KV storage, which functions like a secure digital vault that only your MCP server can access. The beauty of this approach is that your token never gets exposed to the AI tools themselves; instead, the MCP server acts as a trusted intermediary, using your stored token to make requests to Replicate on your behalf. This design means your credentials stay safe even if someone gets access to your AI tool’s configuration, and since everything runs on Cloudflare’s infrastructure, you get enterprise-grade security without the complexity.
Go forth and use your tools
Head over to mcp.replicate.com to get started, and let us know what you build! We’re excited to see what you create with these new capabilities.
If you have any questions or feedback, join us in Discord or reach out on Twitter/X.