AI Development11 min read

Claude Opus 4.6: Features, Benchmarks, and Pricing Guide

Claude Opus 4.6 brings 1M token context, adaptive thinking, and 128K output. Complete guide to benchmarks, pricing, API changes, and enterprise features.

Digital Applied Team
February 5, 2026
11 min read
1M

Token Context (Beta)

128K

Max Output Tokens

65.4%

Terminal-Bench 2.0

80.8%

SWE-bench Verified

Key Takeaways

1M token context window enters beta: Opus 4.6 scores 76% on MRCR v2 (8-needle, 1M context) compared to Sonnet 4.5's 18.5%, representing a qualitative shift in long-context reliability
Adaptive thinking replaces extended thinking: Four effort levels (low, medium, high, max) let Claude dynamically decide when deeper reasoning helps, with high as the default
Highest agentic coding scores to date: 65.4% on Terminal-Bench 2.0, 80.8% on SWE-bench Verified, and 72.7% on OSWorld for agentic computer use
Compaction API enables infinite conversations: Server-side context summarization automatically compresses older messages when approaching the context limit
Breaking change: prefilling disabled: Assistant message prefilling returns a 400 error on Opus 4.6 — migrate to structured outputs or system prompt instructions

Anthropic released Claude Opus 4.6 on February 5, 2026, marking a significant upgrade to its flagship model line. The release introduces adaptive thinking, a 1M token context window in beta, 128K max output tokens, and the highest agentic coding scores Anthropic has achieved to date. For developers and agencies working with Claude's API, this update also includes several breaking changes that require immediate attention.

This follows weeks of speculation after the Claude Sonnet 5 "Fennec" leak via Vertex AI in early February. While Fennec remains unconfirmed, Opus 4.6 delivers on many of the capabilities the AI community has been anticipating — and introduces a few surprises, including the removal of response prefilling and a new compaction API for infinite conversations.

What's New in Claude Opus 4.6

Opus 4.6 is Anthropic's most capable model, positioned as the successor to Opus 4.5 released in November 2025. The update focuses on three areas: reasoning depth through adaptive thinking, context capacity with the 1M token beta, and agentic task execution where it sets new industry benchmarks.

Opus 4.5 vs Opus 4.6 at a Glance

FeatureOpus 4.5Opus 4.6
Context Window200K tokens200K standard / 1M beta
Max Output64K tokens128K tokens
Thinking ModeExtended thinking (budget_tokens)Adaptive thinking (effort levels)
PrefillingSupportedRemoved (400 error)
Pricing (Input/Output)$5 / $25 per MTok$5 / $25 per MTok
GDPval-AA EloBaseline+190 Elo over 4.5
Compaction APINot availableBeta

The pricing parity with Opus 4.5 is notable — Anthropic is delivering substantially more capability at the same cost, with the exception of the 1M context window which carries a premium tier above 200K tokens.

Adaptive Thinking Mode Explained

Adaptive thinking replaces extended thinking as the recommended reasoning mode for Opus 4.6. Instead of manually setting a budget_tokens parameter, Claude now dynamically decides when and how much to reason based on the complexity of each request.

Four Effort Levels

Low

Skips thinking entirely for straightforward tasks. Ideal for simple classification, extraction, or formatting.

Medium

Moderate reasoning for tasks that benefit from some deliberation. Good balance of speed and quality.

High (Default)

Claude almost always thinks at this level. Recommended for most production workloads requiring reliability.

Max

Maximum capability for the hardest problems. New to Opus 4.6. Higher latency but peak reasoning depth.

Migration from Extended Thinking

// Before (Opus 4.5 — deprecated on 4.6)
const response = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 10000 },
  messages: [{ role: "user", content: "Solve this problem..." }]
});

// After (Opus 4.6 — recommended)
const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  // Optional: control depth with effort parameter
  // effort: "max",  // low | medium | high (default) | max
  messages: [{ role: "user", content: "Solve this problem..." }]
});

Adaptive thinking also automatically enables interleaved thinking, which means Claude can reason between tool calls without the previously required interleaved-thinking-2025-05-14 beta header. If you're still passing that header, it will be safely ignored but should be removed.

Benchmark Performance Analysis

Opus 4.6 leads on several key benchmarks, particularly in agentic coding and economically valuable knowledge work. For a broader competitive landscape, see our Claude vs GPT-5.2 vs Gemini 3 Pro comparison.

Agentic and Coding Benchmarks

BenchmarkOpus 4.6Opus 4.5GPT-5.2Gemini 3 Pro
Terminal-Bench 2.065.4%59.8%64.7% (Codex CLI)56.2%
SWE-bench Verified80.8%80.9%80.0%76.2%
OSWorld72.7%66.3%
BrowseComp84.0%67.8%77.9% (Pro)59.2%
Finance Agent60.7%55.9%56.6%44.1%
OpenRCA34.9%26.9%

Reasoning and Knowledge Benchmarks

BenchmarkOpus 4.6Opus 4.5GPT-5.2Gemini 3 Pro
GDPval-AA (Elo)1606141614621195
HLE (with tools)53.1%43.4%50.0% (Pro)45.8%
ARC AGI 268.8%37.6%54.2% (Pro)45.1%
GPQA Diamond91.3%87.0%93.2% (Pro)91.9%
BigLaw Bench90.2%
MMMLU91.1%90.8%89.6%91.8%

A notable result: Opus 4.5 slightly edges out Opus 4.6 on SWE-bench Verified (80.9% vs 80.8%), while GPT-5.2 with Codex CLI reaches 64.7% on Terminal-Bench 2.0 — just 0.7 percentage points behind Opus 4.6. The ARC AGI 2 result is particularly striking, with Opus 4.6 nearly doubling Opus 4.5's score (68.8% vs 37.6%).

Notable Domain-Specific Gains
  • Software Failure Diagnosis: 34.9% on OpenRCA vs Opus 4.5's 26.9% and Sonnet 4.5's 12.9% — a 30% improvement over the previous generation
  • Life Sciences: Nearly 2x improvement over Opus 4.5 in computational biology, structural biology, organic chemistry, and phylogenetics
  • Novel Problem-Solving: 68.8% on ARC AGI 2, nearly doubling Opus 4.5's 37.6% and exceeding GPT-5.2 Pro's 54.2%
  • Long-term Coherence: Earns $3,050.53 more than Opus 4.5 on Vending-Bench 2

1M Context Window and Compaction API

The 1M token context window is the headline expansion for Opus 4.6. Currently in beta for organizations in usage tier 4 or those with custom rate limits, it represents a 5x increase over the standard 200K window.

Long-Context Retrieval Performance

On the MRCR v2 benchmark (8-needle), Opus 4.6 scores 93% at 256K context and 76% at 1M context. For comparison, Sonnet 4.5 manages only 10.8% at 256K and 18.5% at 1M — making Opus 4.6 roughly 4-9x more reliable at retrieving information from deep in long contexts. Anthropic describes this as a qualitative shift in reducing context rot.

Compaction API for Infinite Conversations

The Compaction API (beta) provides server-side context summarization. When a conversation approaches the context window limit, the API automatically summarizes older parts of the conversation. This enables effectively infinite conversations without manual context management, sliding window hacks, or truncation strategies.

For agent-based workflows that involve many tool calls and long chains of reasoning, compaction can significantly reduce the overhead of maintaining conversation state. The summarization happens server-side, so there are no additional API calls required from your application.

Pricing and Availability

Opus 4.6 maintains the same base pricing as Opus 4.5, making it a direct upgrade with no cost increase for existing workloads under 200K tokens.

TierInputOutputNotes
Standard$5 / MTok$25 / MTokUp to 200K context
Long Context (200K+)$10 / MTok$37.50 / MTok1M beta, all tokens at premium rate
US-Only Inference1.1x standard1.1x standardVia inference_geo parameter
Batch Processing$2.50 / MTok$12.50 / MTok50% discount, asynchronous

Platform Availability

  • Anthropic API: Available now with model ID claude-opus-4-6
  • AWS Bedrock: Available at launch with regional and global endpoints
  • Google Vertex AI: Available at launch
  • Microsoft Foundry: Available at launch
  • claude.ai: Available to Pro and Team subscribers

API Changes and Migration Guide

Opus 4.6 introduces both deprecations and a notable breaking change. If you're upgrading from Opus 4.5 or Sonnet 4, review these carefully before switching your model ID.

Breaking: Prefilling Disabled

Prefilling — where you start the assistant's response with specific text like {"role": "assistant", "content": "Here is the JSON:"} — is no longer supported. Alternatives include:

  • Structured outputs: Use output_config.format with json_schema for guaranteed JSON output
  • System prompt instructions: Guide response style and format through the system message
  • JSON output mode: Use output_config.format for general JSON responses

Deprecations

  • Extended thinking (type: "enabled" + budget_tokens): Still functional but deprecated. Migrate to adaptive thinking.
  • Interleaved thinking beta header: Safely ignored on Opus 4.6. Remove from requests.
  • output_format parameter: Moved to output_config.format. Old parameter still works but is deprecated.

New Features

  • Effort parameter (GA): No longer requires a beta header. Combine with adaptive thinking for cost-quality tradeoffs.
  • Fine-grained tool streaming (GA): Now generally available on all models and platforms without a beta header.
  • inference_geo parameter: Request US-only inference with a 1.1x pricing multiplier for data residency compliance.
  • 128K output tokens: Double the previous 64K limit. SDKs require streaming for large max_tokens to avoid HTTP timeouts.

Claude Code and Agent Teams

Alongside the model release, Anthropic launched several product updates that leverage Opus 4.6's improved agentic capabilities. For a deeper look at Claude Code's impact on development workflows, see our Claude Code development guide.

Agent Teams (Research Preview)

Claude Code now supports agent teams — multiple autonomous agents that coordinate in parallel on complex tasks. This enables workflows where one agent handles frontend changes, another tackles backend logic, and a third manages tests, all working simultaneously under supervisory control.

For organizations exploring how Anthropic's enterprise tools fit into larger workflows, our Cowork plugins enterprise guide covers the broader plugin ecosystem.

Office Integrations

Claude in Excel
Enhanced for long-running tasks
  • Pre-planning capability for complex operations
  • Unstructured data ingestion with auto-structuring
Claude in PowerPoint
Research preview
  • Full-deck generation from descriptions
  • Preserves design systems, fonts, and slide masters

Enterprise and Safety Features

Opus 4.6 introduces data residency controls and maintains what Anthropic describes as the lowest over-refusal rates among recent Claude models. For industries with strict compliance requirements, these updates address key adoption barriers.

Data Residency Controls

The new inference_geo parameter lets you specify where model inference runs on a per-request basis. Setting it to "us" guarantees US-only processing at a 1.1x pricing multiplier. The default "global" routing uses standard pricing.

Safety Profile

  • Lowest misalignment score: Opus 4.6 scores approximately 1.8 out of 10 on overall misaligned behavior, the lowest of any Claude model tested — compared to Opus 4.5 (~1.9), Haiku 4.5 (~2.2), Sonnet 4.5 (~2.7), and Opus 4.1 (~4.3)
  • Lowest over-refusal rates: Among recent Claude models, Opus 4.6 is less likely to refuse legitimate requests while maintaining appropriate safety boundaries
  • Cybersecurity: Six new probes developed during the safety evaluation process, with top results in 38 of 40 blind-ranked investigations

For legal and compliance teams, the 90.2% BigLaw Bench score is particularly relevant. For a detailed look at Claude's legal capabilities, see our Claude legal plugin analysis.

Opus 4.6 vs the Competition

Here's how Opus 4.6 compares to its closest competitors across key criteria as of February 2026:

FeatureClaude Opus 4.6GPT-5.2Gemini 3 Pro
Context Window200K / 1M (beta)128K2M
Max Output128K tokens32K tokens8K tokens
Terminal-Bench 2.065.4%64.7% (Codex CLI)56.2%
SWE-bench Verified80.8%80.0%76.2%
GDPval-AA (Elo)160614621195
HLE (with tools)53.1%50.0% (Pro)45.8%
GPQA Diamond91.3%93.2% (Pro)91.9%
Input Pricing$5 / MTokVaries by tierVaries by tier
Output Pricing$25 / MTokVaries by tierVaries by tier
Thinking ModeAdaptive (4 levels)Chain-of-thoughtDeep Think mode

Opus 4.6 leads in knowledge work (GDPval-AA), agentic search (BrowseComp), and legal reasoning (BigLaw). GPT-5.2 edges ahead on graduate-level reasoning (GPQA Diamond) and is within 0.7 points on Terminal-Bench 2.0. Gemini 3 Pro retains the largest native context window at 2M tokens and leads on visual reasoning (MMMU Pro). The right model depends on your specific use case, latency requirements, and budget.

What This Means for Your AI Strategy

Claude Opus 4.6 represents a meaningful step forward in reasoning depth, context handling, and agentic task execution. The adaptive thinking mode simplifies the developer experience, the compaction API opens the door to persistent agent conversations, and the benchmark results suggest Anthropic is pulling ahead in coding and knowledge work tasks.

The most immediate action item for existing Claude API users is the prefilling removal — check your codebase for assistant message prefills and migrate to structured outputs before switching to the new model ID. For teams evaluating Claude for the first time, the pricing parity with Opus 4.5 means there's no cost reason to start with the older model.

Ready to Build with Claude Opus 4.6?

From agentic workflows to enterprise AI integration, our team helps you leverage the latest AI capabilities for real business impact.

Free consultation
Expert guidance
Tailored solutions

Frequently Asked Questions

Related Guides

Continue exploring AI development