Official

openai / gpt-4o

OpenAI's high-intelligence chat model

  • Public
  • 63.3K runs
  • License
Iterate in playground

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many input tokens are sent and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

GPT‑4o is OpenAI’s most advanced flagship model, offering natively multimodal capabilities across text, vision, and audio. It delivers GPT-4‑level performance with faster response times and lower cost, making it ideal for real-time, high-volume applications. GPT‑4o supports audio inputs and outputs, handles images and text simultaneously, and is designed to feel conversational and responsive — like interacting with a human assistant in real time.


Key Capabilities

  • Multimodal input & output: Supports text, images, audio (input) and audio/text (output)
  • Real-time audio responsiveness: Latency as low as 232 ms
  • 1M token context window for deep reasoning over long content (API)
  • High performance across reasoning, math, and code tasks
  • Unified model for all modalities—no need to switch between specialized models

Benchmark Highlights

MMLU (Language understanding):        87.2%
HumanEval (Python coding):           90.2%
GSM8K (Math word problems):          94.4%
MMMU (Vision QA):                    74.1%
VoxCeleb (Speaker ID):               95%+ (est.)
Audio Latency (end-to-end):          ~232–320ms

Use Cases

  • Real-time voice assistants and spoken dialogue agents
  • Multimodal document Q&A (PDFs with diagrams, charts, or images)
  • Code writing, explanation, and debugging
  • High-volume summarization and extraction from audio/text/image
  • Tutoring, presentations, and interactive education tools

🔧 Developer Notes

  • Available via OpenAI API and ChatGPT (Free, Plus, Team, Enterprise)
  • In ChatGPT, GPT‑4o is now the default GPT-4-level model
  • Audio input/output is supported only in ChatGPT for now
  • Image and text input supported via both API and ChatGPT
  • Supports streaming, function calling, tool use, and vision APIs
  • Context window of 128k tokens in ChatGPT; 1M tokens via API (limited release)