Claude Opus 4.6: Features, Benchmarks, and Pricing Guide
Claude Opus 4.6 brings 1M token context, adaptive thinking, and 128K output. Complete guide to benchmarks, pricing, API changes, and enterprise features.
Token Context (Beta)
Max Output Tokens
Terminal-Bench 2.0
SWE-bench Verified
Key Takeaways
Anthropic released Claude Opus 4.6 on February 5, 2026, marking a significant upgrade to its flagship model line. The release introduces adaptive thinking, a 1M token context window in beta, 128K max output tokens, and the highest agentic coding scores Anthropic has achieved to date. For developers and agencies working with Claude's API, this update also includes several breaking changes that require immediate attention.
This follows weeks of speculation after the Claude Sonnet 5 "Fennec" leak via Vertex AI in early February. While Fennec remains unconfirmed, Opus 4.6 delivers on many of the capabilities the AI community has been anticipating — and introduces a few surprises, including the removal of response prefilling and a new compaction API for infinite conversations.
claude-opus-4-6 — note the simplified naming without a date suffix. Opus 4.6 is available now on the Anthropic API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.What's New in Claude Opus 4.6
Opus 4.6 is Anthropic's most capable model, positioned as the successor to Opus 4.5 released in November 2025. The update focuses on three areas: reasoning depth through adaptive thinking, context capacity with the 1M token beta, and agentic task execution where it sets new industry benchmarks.
Opus 4.5 vs Opus 4.6 at a Glance
| Feature | Opus 4.5 | Opus 4.6 |
|---|---|---|
| Context Window | 200K tokens | 200K standard / 1M beta |
| Max Output | 64K tokens | 128K tokens |
| Thinking Mode | Extended thinking (budget_tokens) | Adaptive thinking (effort levels) |
| Prefilling | Supported | Removed (400 error) |
| Pricing (Input/Output) | $5 / $25 per MTok | $5 / $25 per MTok |
| GDPval-AA Elo | Baseline | +190 Elo over 4.5 |
| Compaction API | Not available | Beta |
The pricing parity with Opus 4.5 is notable — Anthropic is delivering substantially more capability at the same cost, with the exception of the 1M context window which carries a premium tier above 200K tokens.
Adaptive Thinking Mode Explained
Adaptive thinking replaces extended thinking as the recommended reasoning mode for Opus 4.6. Instead of manually setting a budget_tokens parameter, Claude now dynamically decides when and how much to reason based on the complexity of each request.
Four Effort Levels
Skips thinking entirely for straightforward tasks. Ideal for simple classification, extraction, or formatting.
Moderate reasoning for tasks that benefit from some deliberation. Good balance of speed and quality.
Claude almost always thinks at this level. Recommended for most production workloads requiring reliability.
Maximum capability for the hardest problems. New to Opus 4.6. Higher latency but peak reasoning depth.
Migration from Extended Thinking
// Before (Opus 4.5 — deprecated on 4.6)
const response = await anthropic.messages.create({
model: "claude-opus-4-5-20251101",
max_tokens: 16000,
thinking: { type: "enabled", budget_tokens: 10000 },
messages: [{ role: "user", content: "Solve this problem..." }]
});
// After (Opus 4.6 — recommended)
const response = await anthropic.messages.create({
model: "claude-opus-4-6",
max_tokens: 16000,
thinking: { type: "adaptive" },
// Optional: control depth with effort parameter
// effort: "max", // low | medium | high (default) | max
messages: [{ role: "user", content: "Solve this problem..." }]
});Adaptive thinking also automatically enables interleaved thinking, which means Claude can reason between tool calls without the previously required interleaved-thinking-2025-05-14 beta header. If you're still passing that header, it will be safely ignored but should be removed.
Benchmark Performance Analysis
Opus 4.6 leads on several key benchmarks, particularly in agentic coding and economically valuable knowledge work. For a broader competitive landscape, see our Claude vs GPT-5.2 vs Gemini 3 Pro comparison.
Agentic and Coding Benchmarks
| Benchmark | Opus 4.6 | Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| Terminal-Bench 2.0 | 65.4% | 59.8% | 64.7% (Codex CLI) | 56.2% |
| SWE-bench Verified | 80.8% | 80.9% | 80.0% | 76.2% |
| OSWorld | 72.7% | 66.3% | — | — |
| BrowseComp | 84.0% | 67.8% | 77.9% (Pro) | 59.2% |
| Finance Agent | 60.7% | 55.9% | 56.6% | 44.1% |
| OpenRCA | 34.9% | 26.9% | — | — |
Reasoning and Knowledge Benchmarks
| Benchmark | Opus 4.6 | Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| GDPval-AA (Elo) | 1606 | 1416 | 1462 | 1195 |
| HLE (with tools) | 53.1% | 43.4% | 50.0% (Pro) | 45.8% |
| ARC AGI 2 | 68.8% | 37.6% | 54.2% (Pro) | 45.1% |
| GPQA Diamond | 91.3% | 87.0% | 93.2% (Pro) | 91.9% |
| BigLaw Bench | 90.2% | — | — | — |
| MMMLU | 91.1% | 90.8% | 89.6% | 91.8% |
A notable result: Opus 4.5 slightly edges out Opus 4.6 on SWE-bench Verified (80.9% vs 80.8%), while GPT-5.2 with Codex CLI reaches 64.7% on Terminal-Bench 2.0 — just 0.7 percentage points behind Opus 4.6. The ARC AGI 2 result is particularly striking, with Opus 4.6 nearly doubling Opus 4.5's score (68.8% vs 37.6%).
- Software Failure Diagnosis: 34.9% on OpenRCA vs Opus 4.5's 26.9% and Sonnet 4.5's 12.9% — a 30% improvement over the previous generation
- Life Sciences: Nearly 2x improvement over Opus 4.5 in computational biology, structural biology, organic chemistry, and phylogenetics
- Novel Problem-Solving: 68.8% on ARC AGI 2, nearly doubling Opus 4.5's 37.6% and exceeding GPT-5.2 Pro's 54.2%
- Long-term Coherence: Earns $3,050.53 more than Opus 4.5 on Vending-Bench 2
1M Context Window and Compaction API
The 1M token context window is the headline expansion for Opus 4.6. Currently in beta for organizations in usage tier 4 or those with custom rate limits, it represents a 5x increase over the standard 200K window.
Long-Context Retrieval Performance
On the MRCR v2 benchmark (8-needle), Opus 4.6 scores 93% at 256K context and 76% at 1M context. For comparison, Sonnet 4.5 manages only 10.8% at 256K and 18.5% at 1M — making Opus 4.6 roughly 4-9x more reliable at retrieving information from deep in long contexts. Anthropic describes this as a qualitative shift in reducing context rot.
Compaction API for Infinite Conversations
The Compaction API (beta) provides server-side context summarization. When a conversation approaches the context window limit, the API automatically summarizes older parts of the conversation. This enables effectively infinite conversations without manual context management, sliding window hacks, or truncation strategies.
For agent-based workflows that involve many tool calls and long chains of reasoning, compaction can significantly reduce the overhead of maintaining conversation state. The summarization happens server-side, so there are no additional API calls required from your application.
Pricing and Availability
Opus 4.6 maintains the same base pricing as Opus 4.5, making it a direct upgrade with no cost increase for existing workloads under 200K tokens.
| Tier | Input | Output | Notes |
|---|---|---|---|
| Standard | $5 / MTok | $25 / MTok | Up to 200K context |
| Long Context (200K+) | $10 / MTok | $37.50 / MTok | 1M beta, all tokens at premium rate |
| US-Only Inference | 1.1x standard | 1.1x standard | Via inference_geo parameter |
| Batch Processing | $2.50 / MTok | $12.50 / MTok | 50% discount, asynchronous |
Platform Availability
- Anthropic API: Available now with model ID
claude-opus-4-6 - AWS Bedrock: Available at launch with regional and global endpoints
- Google Vertex AI: Available at launch
- Microsoft Foundry: Available at launch
- claude.ai: Available to Pro and Team subscribers
API Changes and Migration Guide
Opus 4.6 introduces both deprecations and a notable breaking change. If you're upgrading from Opus 4.5 or Sonnet 4, review these carefully before switching your model ID.
Breaking: Prefilling Disabled
Prefilling — where you start the assistant's response with specific text like {"role": "assistant", "content": "Here is the JSON:"} — is no longer supported. Alternatives include:
- Structured outputs: Use
output_config.formatwithjson_schemafor guaranteed JSON output - System prompt instructions: Guide response style and format through the system message
- JSON output mode: Use
output_config.formatfor general JSON responses
Deprecations
- Extended thinking (type: "enabled" + budget_tokens): Still functional but deprecated. Migrate to adaptive thinking.
- Interleaved thinking beta header: Safely ignored on Opus 4.6. Remove from requests.
- output_format parameter: Moved to
output_config.format. Old parameter still works but is deprecated.
New Features
- Effort parameter (GA): No longer requires a beta header. Combine with adaptive thinking for cost-quality tradeoffs.
- Fine-grained tool streaming (GA): Now generally available on all models and platforms without a beta header.
- inference_geo parameter: Request US-only inference with a 1.1x pricing multiplier for data residency compliance.
- 128K output tokens: Double the previous 64K limit. SDKs require streaming for large max_tokens to avoid HTTP timeouts.
Claude Code and Agent Teams
Alongside the model release, Anthropic launched several product updates that leverage Opus 4.6's improved agentic capabilities. For a deeper look at Claude Code's impact on development workflows, see our Claude Code development guide.
Agent Teams (Research Preview)
Claude Code now supports agent teams — multiple autonomous agents that coordinate in parallel on complex tasks. This enables workflows where one agent handles frontend changes, another tackles backend logic, and a third manages tests, all working simultaneously under supervisory control.
For organizations exploring how Anthropic's enterprise tools fit into larger workflows, our Cowork plugins enterprise guide covers the broader plugin ecosystem.
Office Integrations
- Pre-planning capability for complex operations
- Unstructured data ingestion with auto-structuring
- Full-deck generation from descriptions
- Preserves design systems, fonts, and slide masters
Enterprise and Safety Features
Opus 4.6 introduces data residency controls and maintains what Anthropic describes as the lowest over-refusal rates among recent Claude models. For industries with strict compliance requirements, these updates address key adoption barriers.
Data Residency Controls
The new inference_geo parameter lets you specify where model inference runs on a per-request basis. Setting it to "us" guarantees US-only processing at a 1.1x pricing multiplier. The default "global" routing uses standard pricing.
Safety Profile
- Lowest misalignment score: Opus 4.6 scores approximately 1.8 out of 10 on overall misaligned behavior, the lowest of any Claude model tested — compared to Opus 4.5 (~1.9), Haiku 4.5 (~2.2), Sonnet 4.5 (~2.7), and Opus 4.1 (~4.3)
- Lowest over-refusal rates: Among recent Claude models, Opus 4.6 is less likely to refuse legitimate requests while maintaining appropriate safety boundaries
- Cybersecurity: Six new probes developed during the safety evaluation process, with top results in 38 of 40 blind-ranked investigations
For legal and compliance teams, the 90.2% BigLaw Bench score is particularly relevant. For a detailed look at Claude's legal capabilities, see our Claude legal plugin analysis.
Opus 4.6 vs the Competition
Here's how Opus 4.6 compares to its closest competitors across key criteria as of February 2026:
| Feature | Claude Opus 4.6 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|
| Context Window | 200K / 1M (beta) | 128K | 2M |
| Max Output | 128K tokens | 32K tokens | 8K tokens |
| Terminal-Bench 2.0 | 65.4% | 64.7% (Codex CLI) | 56.2% |
| SWE-bench Verified | 80.8% | 80.0% | 76.2% |
| GDPval-AA (Elo) | 1606 | 1462 | 1195 |
| HLE (with tools) | 53.1% | 50.0% (Pro) | 45.8% |
| GPQA Diamond | 91.3% | 93.2% (Pro) | 91.9% |
| Input Pricing | $5 / MTok | Varies by tier | Varies by tier |
| Output Pricing | $25 / MTok | Varies by tier | Varies by tier |
| Thinking Mode | Adaptive (4 levels) | Chain-of-thought | Deep Think mode |
Opus 4.6 leads in knowledge work (GDPval-AA), agentic search (BrowseComp), and legal reasoning (BigLaw). GPT-5.2 edges ahead on graduate-level reasoning (GPQA Diamond) and is within 0.7 points on Terminal-Bench 2.0. Gemini 3 Pro retains the largest native context window at 2M tokens and leads on visual reasoning (MMMU Pro). The right model depends on your specific use case, latency requirements, and budget.
What This Means for Your AI Strategy
Claude Opus 4.6 represents a meaningful step forward in reasoning depth, context handling, and agentic task execution. The adaptive thinking mode simplifies the developer experience, the compaction API opens the door to persistent agent conversations, and the benchmark results suggest Anthropic is pulling ahead in coding and knowledge work tasks.
The most immediate action item for existing Claude API users is the prefilling removal — check your codebase for assistant message prefills and migrate to structured outputs before switching to the new model ID. For teams evaluating Claude for the first time, the pricing parity with Opus 4.5 means there's no cost reason to start with the older model.
Ready to Build with Claude Opus 4.6?
From agentic workflows to enterprise AI integration, our team helps you leverage the latest AI capabilities for real business impact.
Frequently Asked Questions
Related Guides
Continue exploring AI development