OpenAI GPT 4.1 API

Official

openai / gpt-4.1

OpenAI's Flagship GPT model for complex tasks. (Updated 3 days, 15 hours ago)

Warm

Public
9.3K runs
Priced by multiple properties
Commercial use
License

Iterate in playground

Run with an API

Playground API Examples README

Input

prompt

string

Shift + Return to add a new line

The prompt to send to the model. Do not use if using messages.

system_prompt

string

Shift + Return to add a new line

System prompt to set the assistant's behavior

image_input

file[]

Add multiple files

https://replicate.delivery/pbxt/MvnA4wptE8FOHD44bKsfVj8hQdXSvdDAcFgYs5GEODou9OP9/4b2ebb2d-89d8-43de-bc84-51c380365a40.jpg

List of images to send to the model

Default: []

temperature

number

(minimum: 0, maximum: 2)

Sampling temperature between 0 and 2

Default: 1

max_completion_tokens

integer

Maximum number of completion tokens to generate

Default: 4096

top_p

number

(minimum: 0, maximum: 1)

Nucleus sampling parameter - the model considers the results of the tokens with top_p probability mass. (0.1 means only the tokens comprising the top 10% probability mass are considered.)

Default: 1

frequency_penalty

number

(minimum: -2, maximum: 2)

Frequency penalty parameter - positive values penalize the repetition of tokens.

Default: 0

presence_penalty

number

(minimum: -2, maximum: 2)

Presence penalty parameter - positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default: 0

Run this model in Node.js with one line of code:

npx create-replicate --model=openai/gpt-4.1

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run openai/gpt-4.1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  top_p: 1,
  prompt: "What is happening in this image?",
  image_input: [{"value":"https://replicate.delivery/pbxt/MvnA4wptE8FOHD44bKsfVj8hQdXSvdDAcFgYs5GEODou9OP9/4b2ebb2d-89d8-43de-bc84-51c380365a40.jpg"}],
  temperature: 1,
  system_prompt: "You are a helpful assistant.",
  presence_penalty: 0,
  frequency_penalty: 0,
  max_completion_tokens: 4096
};

for await (const event of replicate.stream("openai/gpt-4.1", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run openai/gpt-4.1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The openai/gpt-4.1 model can stream output as it's running.
for event in replicate.stream(
    "openai/gpt-4.1",
    input={
        "top_p": 1,
        "prompt": "What is happening in this image?",
        "image_input": [{"value":"https://replicate.delivery/pbxt/MvnA4wptE8FOHD44bKsfVj8hQdXSvdDAcFgYs5GEODou9OP9/4b2ebb2d-89d8-43de-bc84-51c380365a40.jpg"}],
        "temperature": 1,
        "system_prompt": "You are a helpful assistant.",
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "max_completion_tokens": 4096
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run openai/gpt-4.1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "top_p": 1,
      "prompt": "What is happening in this image?",
      "image_input": [{"value":"https://replicate.delivery/pbxt/MvnA4wptE8FOHD44bKsfVj8hQdXSvdDAcFgYs5GEODou9OP9/4b2ebb2d-89d8-43de-bc84-51c380365a40.jpg"}],
      "temperature": 1,
      "system_prompt": "You are a helpful assistant.",
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "max_completion_tokens": 4096
    }
  }' \
  https://api.replicate.com/v1/models/openai/gpt-4.1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

In the image, someone is spreading butter on a slice of toast using a product labeled "BUTTER STICK TYPE." The product resembles a glue stick, but it is meant for butter, allowing the user to easily apply butter to toast by rubbing the stick directly onto the bread. This is a creative and convenient way to spread butter, especially while the toast is still warm and the butter softens and melts quickly.

Generated in

4.0 seconds

Input tokens

780

Output tokens

Tokens per second

21.06 tokens / second

Time to first token

20 milliseconds

Tweak it Share Report View full prediction

Pricing

Model pricing for openai/gpt-4.1. Looking for volume pricing? Get in touch.

per million output tokens

or 125,000 tokens for $1

per million input tokens

or 500,000 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

GPT-4.1 is a high-performance language model optimized for real-world applications, delivering major improvements in coding, instruction following, and long-context comprehension. It supports up to 1 million tokens of context, features a June 2024 knowledge cutoff, and is designed to be more reliable and cost-effective across a wide range of use cases — from building intelligent agents to processing large codebases and documents. GPT‑4.1 offers improved reasoning, faster output, and significantly enhanced formatting fidelity.

Key Capabilities

1M token context window for large document/code handling
Improved instruction following, including format adherence, content control, and negative/ordered instructions
Top-tier performance in coding tasks and diffs
Optimized for agentic workflows, long-context reasoning, and tool use
Real-world tested across legal, financial, engineering, and developer tools

Benchmark Highlights

SWE-bench Verified (Coding): 54.6% 

MultiChallenge (Instruction): 38.3% 

IFEval (Format compliance): 87.4% 

Video-MME (Long video QA): 72.0% 

Aider Diff Format Accuracy: 53% 

Graphwalks (Multi-hop reasoning): 62%

Use Cases

Building agentic systems with strong multi-turn coherence
Editing and understanding large codebases or diff formats
Complex data extraction from lengthy documents
Highly structured content generation
Multimodal reasoning tasks (e.g., charts, diagrams, videos)

🔧 Developer Notes

Available via OpenAI API only
Supports up to 32,768 output tokens
Compatible with prompt caching and Batch API
Designed for production-scale performance and reliability

🧪 Real-World Results

Windsurf: 60% higher accuracy on internal code benchmarks; smoother tool usage
Qodo: Better suggestions in 55% of pull request reviews, with higher precision and focus
Blue J: 53% more accurate on complex tax scenarios
Thomson Reuters: 17% improvement in long-document legal review
Carlyle: 50% better retrieval accuracy across large financial files