Official

openai / gpt-4.1-mini

Fast, affordable version of GPT-4.1

  • Public
  • 184.4K runs
  • License
Iterate in playground

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many input tokens are sent and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

GPT‑4.1 mini is a compact, high-performance model designed for real-world applications that require fast response times and low cost—without sacrificing intelligence. It delivers performance competitive with GPT‑4o while cutting latency nearly in half and reducing cost by 83%.

Key Features

  • Fast and lightweight, ideal for latency-sensitive use cases
  • High accuracy across coding, reasoning, and instruction following
  • Supports 1 million token context windows
  • Cost-effective for large-scale deployments
  • Reliable for long-context and format-specific tasks

Benchmark Highlights

  • SWE-bench Verified (coding): 24%
  • MultiChallenge (instruction following): 36%
  • IFEval (format compliance): 84%
  • Aider Diff Format Accuracy (diff): 45%
  • MMMU (vision QA): 73%

Use Cases

  • Chatbots and assistants
  • Lightweight code generation and review
  • Document Q&A and summarization
  • Image reasoning
  • High-volume, cost-sensitive tasks

Notes

  • Available via the OpenAI API
  • Not currently available in ChatGPT
  • Supports up to 1 million tokens of context