Readme

Claude 4.5 Sonnet

Claude 4.5 Sonnet is Anthropic’s latest frontier AI model, designed to excel in coding, complex reasoning, agentic tasks, and computer use. It represents a significant advancement over previous models like Claude Sonnet 4 and Opus 4.1, with state-of-the-art performance in benchmarks such as SWE-bench Verified (77.2%) and OSWorld (61.4%). This model maintains focus on multi-step tasks for over 30 hours, making it ideal for software development, domain-specific analysis in fields like finance, law, medicine, and STEM, and building intelligent agents.

Released under AI Safety Level 3 (ASL-3) protections, Claude 4.5 Sonnet emphasizes safety and alignment, reducing behaviors like sycophancy, deception, and prompt injection vulnerabilities.

Key Features

Coding Excellence: Best-in-class coding capabilities, leading on SWE-bench Verified (77.2% average over 10 trials) and supporting code execution, file creation (e.g., spreadsheets, slides, documents), and long-term focus on complex projects.
Agentic and Computer Use: Top performance on OSWorld (61.4%), enabling tasks like browser navigation, form filling, and real-time software generation via tools like “Imagine with Claude.”
Reasoning and Domain Knowledge: Dramatic improvements in reasoning, math, and specialized knowledge across finance, law, medicine, and STEM, outperforming older models in evaluations like AIME, MMMLU, and Finance Agent.
Safety and Alignment: Most aligned frontier model yet, with reduced concerning behaviors and enhanced defenses against prompt injections. Includes classifiers for detecting risks related to CBRN weapons.
Product Integrations: Supports context editing, memory tools in the Claude API, and the Claude Agent SDK for building custom agents. Features like checkpoints for progress saving and a native VS Code extension enhance usability.
Pricing: Consistent with Claude Sonnet 4 at $3/$15 per million tokens (input/output). Check Replicate for usage-based pricing details.

Benchmarks

Benchmark	Score	Notes
SWE-bench Verified	77.2% (avg. 10 trials)	200K thinking budget, no test-time compute on 500 problems; high-compute variant at 82.0%
OSWorld	61.4%	Official framework, 100 max steps, averaged across 4 runs
Terminal-Bench	Leading performance	Specific configurations detailed in Anthropic docs
τ2-bench	State-of-the-art	-
AIME	Improved math reasoning	-
MMMLU	Enhanced knowledge	-
Finance Agent	Domain-specific gains	-

For full methodologies, refer to the Anthropic announcement.

License and Safety

This model is provided under Anthropic’s terms of use. It includes safety features like ASL-3 protections. For ethical guidelines, refer to Anthropic’s Responsible Scaling Policy.

For more details, visit the official Anthropic documentation.

Model created 5 months ago