Official

anthropic / claude-4-sonnet

Claude Sonnet 4 is a significant upgrade to 3.7, delivering superior coding and reasoning while responding more precisely to your instructions

  • Public
  • 81.1K runs
Iterate in playground

Claude Sonnet 4

Claude Sonnet 4 is a hybrid reasoning model that offers both near-instant responses and extended thinking capabilities. It significantly improves upon Claude Sonnet 3.7’s performance while maintaining efficiency for everyday use cases.

Key Capabilities

Dual Operating Modes

  • Standard mode: Fast responses for typical tasks
  • Extended thinking: Deep reasoning for complex problems (up to 64K tokens)

Core Features

  • Advanced coding capabilities with 72.7% performance on SWE-bench
  • Enhanced instruction following and steerability
  • Parallel tool execution
  • Memory improvements when given access to local files
  • Web search integration during extended thinking (beta)
  • 65% reduction in shortcut/loophole behavior compared to Sonnet 3.7

Performance Benchmarks

Coding

  • SWE-bench Verified: 72.7%
  • Described as “state-of-the-art” for coding tasks

Reasoning (with extended thinking)

  • GPQA Diamond: 75.5% (70.0% without extended thinking)
  • MMMLU: 88.2% (85.4% without extended thinking)
  • MMMU: 77.6% (72.6% without extended thinking)
  • AIME: 40.0% (33.1% without extended thinking)

Pricing

  • Input: $3 per million tokens
  • Output: $15 per million tokens

Safety and Reliability

  • Implements AI Safety Level 3 (ASL-3) protections
  • Extensive testing and evaluation
  • Reduced tendency to use shortcuts or exploit loopholes
  • Thinking summaries available (condensed from full reasoning when needed)

Use Cases

Sonnet 4 is optimized for:

  • Daily coding tasks and development workflows
  • Complex instruction following
  • Multi-file codebase operations
  • Autonomous application development
  • Long-form reasoning and analysis
  • Agent-based workflows

Limitations

  • Does not match Claude Opus 4 performance in most domains
  • Extended thinking features require paid plans
  • Memory capabilities depend on developer-provided file access