openai/gpt-4o

OpenAI's high-intelligence chat model

Official
170.9K runs
Commercial use

Input

string
Shift + Return to add a new line

The prompt to send to the model. Do not use if using messages.

string
Shift + Return to add a new line

System prompt to set the assistant's behavior

This input type is only available via the API.

A JSON string representing a list of messages. For example: [{"role": "user", "content": "Hello, how are you?"}]. If provided, prompt and system_prompt are ignored.

Default: []

file[]

List of images to send to the model

Default: []

number
(minimum: 0, maximum: 2)

Sampling temperature between 0 and 2

Default: 1

integer

Maximum number of completion tokens to generate

Default: 4096

number
(minimum: 0, maximum: 1)

Nucleus sampling parameter - the model considers the results of the tokens with top_p probability mass. (0.1 means only the tokens comprising the top 10% probability mass are considered.)

Default: 1

number
(minimum: -2, maximum: 2)

Frequency penalty parameter - positive values penalize the repetition of tokens.

Default: 0

number
(minimum: -2, maximum: 2)

Presence penalty parameter - positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default: 0

Output

The 16th president of the United States was Abraham Lincoln. He served from March 4, 1861, until his assassination on April 15, 1865.
Generated in
Input tokens
29
Output tokens
36
Tokens per second
59.70 tokens / second
Time to first token

Pricing

Model pricing for openai/gpt-4o. Looking for volume pricing? Get in touch.

$0.01
per thousand output tokens

or 100,000 tokens for $1

$2.50
per million input tokens

or 400,000 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

GPT‑4o is OpenAI’s most advanced flagship model, offering natively multimodal capabilities across text, vision, and audio. It delivers GPT-4‑level performance with faster response times and lower cost, making it ideal for real-time, high-volume applications. GPT‑4o supports audio inputs and outputs, handles images and text simultaneously, and is designed to feel conversational and responsive — like interacting with a human assistant in real time.


Key Capabilities

  • Multimodal input & output: Supports text, images, audio (input) and audio/text (output)
  • Real-time audio responsiveness: Latency as low as 232 ms
  • 1M token context window for deep reasoning over long content (API)
  • High performance across reasoning, math, and code tasks
  • Unified model for all modalities—no need to switch between specialized models

Benchmark Highlights

MMLU (Language understanding):        87.2%
HumanEval (Python coding):           90.2%
GSM8K (Math word problems):          94.4%
MMMU (Vision QA):                    74.1%
VoxCeleb (Speaker ID):               95%+ (est.)
Audio Latency (end-to-end):          ~232–320ms

Use Cases

  • Real-time voice assistants and spoken dialogue agents
  • Multimodal document Q&A (PDFs with diagrams, charts, or images)
  • Code writing, explanation, and debugging
  • High-volume summarization and extraction from audio/text/image
  • Tutoring, presentations, and interactive education tools

🔧 Developer Notes

  • Available via OpenAI API and ChatGPT (Free, Plus, Team, Enterprise)
  • In ChatGPT, GPT‑4o is now the default GPT-4-level model
  • Audio input/output is supported only in ChatGPT for now
  • Image and text input supported via both API and ChatGPT
  • Supports streaming, function calling, tool use, and vision APIs
  • Context window of 128k tokens in ChatGPT; 1M tokens via API (limited release)