anthropic/claude-4.5-sonnet

Claude Sonnet 4.5 is the best coding model to date, with significant improvements across the entire development lifecycle

93 runs

Pricing

Readme

Claude 4.5 Sonnet

Claude 4.5 Sonnet is Anthropic’s latest frontier AI model, designed to excel in coding, complex reasoning, agentic tasks, and computer use. It represents a significant advancement over previous models like Claude Sonnet 4 and Opus 4.1, with state-of-the-art performance in benchmarks such as SWE-bench Verified (77.2%) and OSWorld (61.4%). This model maintains focus on multi-step tasks for over 30 hours, making it ideal for software development, domain-specific analysis in fields like finance, law, medicine, and STEM, and building intelligent agents.

Released under AI Safety Level 3 (ASL-3) protections, Claude 4.5 Sonnet emphasizes safety and alignment, reducing behaviors like sycophancy, deception, and prompt injection vulnerabilities.

Key Features

  • Coding Excellence: Best-in-class coding capabilities, leading on SWE-bench Verified (77.2% average over 10 trials) and supporting code execution, file creation (e.g., spreadsheets, slides, documents), and long-term focus on complex projects.
  • Agentic and Computer Use: Top performance on OSWorld (61.4%), enabling tasks like browser navigation, form filling, and real-time software generation via tools like “Imagine with Claude.”
  • Reasoning and Domain Knowledge: Dramatic improvements in reasoning, math, and specialized knowledge across finance, law, medicine, and STEM, outperforming older models in evaluations like AIME, MMMLU, and Finance Agent.
  • Safety and Alignment: Most aligned frontier model yet, with reduced concerning behaviors and enhanced defenses against prompt injections. Includes classifiers for detecting risks related to CBRN weapons.
  • Product Integrations: Supports context editing, memory tools in the Claude API, and the Claude Agent SDK for building custom agents. Features like checkpoints for progress saving and a native VS Code extension enhance usability.
  • Pricing: Consistent with Claude Sonnet 4 at $3/$15 per million tokens (input/output). Check Replicate for usage-based pricing details.

Benchmarks

Benchmark Score Notes
SWE-bench Verified 77.2% (avg. 10 trials) 200K thinking budget, no test-time compute on 500 problems; high-compute variant at 82.0%
OSWorld 61.4% Official framework, 100 max steps, averaged across 4 runs
Terminal-Bench Leading performance Specific configurations detailed in Anthropic docs
τ2-bench State-of-the-art -
AIME Improved math reasoning -
MMMLU Enhanced knowledge -
Finance Agent Domain-specific gains -

For full methodologies, refer to the Anthropic announcement.

License and Safety

This model is provided under Anthropic’s terms of use. It includes safety features like ASL-3 protections. For ethical guidelines, refer to Anthropic’s Responsible Scaling Policy.

For more details, visit the official Anthropic documentation.