Official

openai / o1-mini

A small model alternative to o1

  • Public
  • 85 runs
  • License
Iterate in playground

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many input tokens are sent and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

OpenAI o1-mini is a compact, cost-efficient model built on the o1 architecture, designed to bring the intelligence of frontier models to real-time, latency-sensitive, and high-throughput applications. It offers fast responses, strong reasoning, and reliable formatting, making it ideal for production-grade assistants, coding tools, and document processing at scale.


⚑️ Key Capabilities

  • Compact and responsive: Built for speed-critical tasks
  • High accuracy on core NLP tasks, with reliable outputs
  • Supports 1M token context (API) for large document/code understanding
  • Instruction-following and formatting fidelity for structured workflows
  • Optimized for production-scale deployment with lower compute cost

πŸ“Š Benchmark Highlights

SWE-bench Verified (Coding):        35%
MultiChallenge (Instruction):       41%
IFEval (Format Compliance):         86%
Aider Diff Format Accuracy:         49%
MMMU (Vision QA):                   75%

πŸ§‘β€πŸ’» Use Cases

  • Lightweight virtual assistants and customer-facing bots
  • Code suggestion, completion, and inline diffing
  • Structured Q&A over long documents or API docs
  • High-volume email, ticket, or content summarization
  • Fast inference pipelines with compute constraints

πŸ”§ Developer Notes

  • Model name: o1-mini
  • Available via OpenAI API and included in ChatGPT Pro
  • Supports function calling, streaming, tool use, and system instructions
  • Context window: 1 million tokens (API), 128k tokens (ChatGPT)
  • Balanced for speed, cost, and quality

πŸ’‘ Why o1-mini?

  • Production-ready: Trusted for real-world latency-sensitive applications
  • Cost-effective: Substantially cheaper than o1 or GPT-4o
  • Reliable formatting: Great for diffs, tables, and structured outputs
  • Smart enough for tough tasks, fast enough for scale