Claude Sonnet 4
by Anthropic·USA·Released
The workhorse Claude tier — extended thinking at a fraction of Opus pricing.
About this model
Sonnet 4 is the workhorse tier of the Claude 4 family — released alongside Opus 4 in May 2025 and priced at one-fifth the cost ($3/M input, $15/M output). On Anthropic's evaluation Sonnet 4 actually matches or slightly exceeds Opus 4 on SWE-bench Verified at 72.7%, which makes it the surprising default choice for many coding workloads.
Sonnet 4 shares the same extended-thinking mechanism and MCP tool-call format as Opus 4. The main quality gap shows up on harder GPQA-style scientific reasoning where Opus 4's longer thinking budget pays off. For typical chat, coding, and tool-use workloads Sonnet 4 is essentially indistinguishable from Opus at a fraction of the cost.
Strengths
- •Sonnet 4 matches Opus 4 on SWE-bench Verified at 1/5 the price
- •Same MCP tool-call ergonomics as Opus — swap models without code changes
- •Extended thinking available as opt-in
- •Default tier in Cursor, Claude Code, and most production AI products
Limitations
- •Trails Opus 4 on hardest scientific / mathematical reasoning
- •Same 200K context — smaller than Gemini 2.5 Pro's 1M
- •Closed weights
When to use it
- →Default tier for production coding agents
- →High-volume customer-facing chatbots needing tool use
- →RAG pipelines at scale
- →Two-model architectures with auto-escalation to Opus 4 on confidence drop
Architecture & training
Anthropic has confirmed Sonnet 4 shares the Opus 4 pretraining corpus and post-training pipeline (Constitutional AI + RLHF) with a smaller activated-parameter count. The fact that Sonnet 4 matches Opus on coding benchmarks while being substantially cheaper has prompted broader industry questions about whether 'flagship' tiers are still worth their price premium for many workloads.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| GPQA | 75.4 | |
| MMLU | 88.3 | |
| SWE-bench Verified | 72.7 |