Readme
Claude 4.5 Sonnet
Claude 4.5 Sonnet is Anthropic’s latest frontier AI model, designed to excel in coding, complex reasoning, agentic tasks, and computer use. It represents a significant advancement over previous models like Claude Sonnet 4 and Opus 4.1, with state-of-the-art performance in benchmarks such as SWE-bench Verified (77.2%) and OSWorld (61.4%). This model maintains focus on multi-step tasks for over 30 hours, making it ideal for software development, domain-specific analysis in fields like finance, law, medicine, and STEM, and building intelligent agents.
Released under AI Safety Level 3 (ASL-3) protections, Claude 4.5 Sonnet emphasizes safety and alignment, reducing behaviors like sycophancy, deception, and prompt injection vulnerabilities.
Key Features
- Coding Excellence: Best-in-class coding capabilities, leading on SWE-bench Verified (77.2% average over 10 trials) and supporting code execution, file creation (e.g., spreadsheets, slides, documents), and long-term focus on complex projects.
- Agentic and Computer Use: Top performance on OSWorld (61.4%), enabling tasks like browser navigation, form filling, and real-time software generation via tools like “Imagine with Claude.”
- Reasoning and Domain Knowledge: Dramatic improvements in reasoning, math, and specialized knowledge across finance, law, medicine, and STEM, outperforming older models in evaluations like AIME, MMMLU, and Finance Agent.
- Safety and Alignment: Most aligned frontier model yet, with reduced concerning behaviors and enhanced defenses against prompt injections. Includes classifiers for detecting risks related to CBRN weapons.
- Product Integrations: Supports context editing, memory tools in the Claude API, and the Claude Agent SDK for building custom agents. Features like checkpoints for progress saving and a native VS Code extension enhance usability.
- Pricing: Consistent with Claude Sonnet 4 at $3/$15 per million tokens (input/output). Check Replicate for usage-based pricing details.
Benchmarks
Benchmark | Score | Notes |
---|---|---|
SWE-bench Verified | 77.2% (avg. 10 trials) | 200K thinking budget, no test-time compute on 500 problems; high-compute variant at 82.0% |
OSWorld | 61.4% | Official framework, 100 max steps, averaged across 4 runs |
Terminal-Bench | Leading performance | Specific configurations detailed in Anthropic docs |
τ2-bench | State-of-the-art | - |
AIME | Improved math reasoning | - |
MMMLU | Enhanced knowledge | - |
Finance Agent | Domain-specific gains | - |
For full methodologies, refer to the Anthropic announcement.
License and Safety
This model is provided under Anthropic’s terms of use. It includes safety features like ASL-3 protections. For ethical guidelines, refer to Anthropic’s Responsible Scaling Policy.
For more details, visit the official Anthropic documentation.