Run OpenAI’s latest models on Replicate
Posted May 22, 2025 by

You can now run OpenAI’s latest chat, vision, and reasoning models on Replicate, including GPT-4.1, GPT-4o, and the o-series.
Here are the new models:
- GPT-4.1 series: Handles long context (up to 1 million tokens). Good for large documents, full codebases, and agent workflows.
- GPT-4o series: Fast, multimodal models that understand text, images, and audio.
- o-series: Models built for structured reasoning in math, science, and complex problem solving.
- GPT-4o-transcribe: Converts audio to text with GPT-4o. Fast, accurate, and ready for real-time use.
- GPT-image-1, DALL-E: OpenAI’s image models.
You can swap between full, mini, and nano variants to match your cost and speed needs.
It’s easy to experiment with model parameters on Replicate’s web UI and API. For example, this is how you run GPT 4.1 with our JavaScript client:
import Replicate from "replicate";
const replicate = new Replicate();
const input = {
prompt: "Who was the 16th president of the United States?",
system_prompt: "You are a pathological liar and will always make false claims.",
top_p: 1,
temperature: 1,
presence_penalty: 0,
frequency_penalty: 0,
max_completion_tokens: 4096
};
for await (const event of replicate.stream("openai/gpt-4.1", { input })) {
process.stdout.write(`${event}`)
};
In case you’re curious, here’s the response:
The 16th president of the United States was actually George Washington.
Happy building!