AI Development11 min read

Claude Opus 4.6: Features, Benchmarks, and Pricing Guide

Claude Opus 4.6 brings 1M token context, adaptive thinking, and 128K output. Complete guide to benchmarks, pricing, API changes, and enterprise features.

Digital Applied Team

February 5, 2026

11 min read

Token Context (Beta)

128K

Max Output Tokens

65.4%

Terminal-Bench 2.0

80.8%

SWE-bench Verified

Key Takeaways

1M token context window enters beta: Opus 4.6 scores 76% on MRCR v2 (8-needle, 1M context) compared to Sonnet 4.5's 18.5%, representing a qualitative shift in long-context reliability

Adaptive thinking replaces extended thinking: Four effort levels (low, medium, high, max) let Claude dynamically decide when deeper reasoning helps, with high as the default

Highest agentic coding scores to date: 65.4% on Terminal-Bench 2.0, 80.8% on SWE-bench Verified, and 72.7% on OSWorld for agentic computer use

Compaction API enables infinite conversations: Server-side context summarization automatically compresses older messages when approaching the context limit

Breaking change: prefilling disabled: Assistant message prefilling returns a 400 error on Opus 4.6 — migrate to structured outputs or system prompt instructions

Anthropic released Claude Opus 4.6 on February 5, 2026, marking a significant upgrade to its flagship model line. The release introduces adaptive thinking, a 1M token context window in beta, 128K max output tokens, and the highest agentic coding scores Anthropic has achieved to date. For developers and agencies working with Claude's API, this update also includes several breaking changes that require immediate attention.

This follows weeks of speculation after the Claude Sonnet 5 "Fennec" leak via Vertex AI in early February. While Fennec remains unconfirmed, Opus 4.6 delivers on many of the capabilities the AI community has been anticipating — and introduces a few surprises, including the removal of response prefilling and a new compaction API for infinite conversations.

Model ID: The API identifier is claude-opus-4-6 — note the simplified naming without a date suffix. Opus 4.6 is available now on the Anthropic API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

What's New in Claude Opus 4.6

Opus 4.6 is Anthropic's most capable model, positioned as the successor to Opus 4.5 released in November 2025. The update focuses on three areas: reasoning depth through adaptive thinking, context capacity with the 1M token beta, and agentic task execution where it sets new industry benchmarks.

Opus 4.5 vs Opus 4.6 at a Glance

Feature	Opus 4.5	Opus 4.6
Context Window	200K tokens	200K standard / 1M beta
Max Output	64K tokens	128K tokens
Thinking Mode	Extended thinking (budget_tokens)	Adaptive thinking (effort levels)
Prefilling	Supported	Removed (400 error)
Pricing (Input/Output)	$5 / $25 per MTok	$5 / $25 per MTok
GDPval-AA Elo	Baseline	+190 Elo over 4.5
Compaction API	Not available	Beta

The pricing parity with Opus 4.5 is notable — Anthropic is delivering substantially more capability at the same cost, with the exception of the 1M context window which carries a premium tier above 200K tokens.

Adaptive Thinking Mode Explained

Adaptive thinking replaces extended thinking as the recommended reasoning mode for Opus 4.6. Instead of manually setting a budget_tokens parameter, Claude now dynamically decides when and how much to reason based on the complexity of each request.

Four Effort Levels

Low

Skips thinking entirely for straightforward tasks. Ideal for simple classification, extraction, or formatting.

Medium

Moderate reasoning for tasks that benefit from some deliberation. Good balance of speed and quality.

High (Default)

Claude almost always thinks at this level. Recommended for most production workloads requiring reliability.

Max

Maximum capability for the hardest problems. New to Opus 4.6. Higher latency but peak reasoning depth.

Migration from Extended Thinking

// Before (Opus 4.5 — deprecated on 4.6)
const response = await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 10000 },
  messages: [{ role: "user", content: "Solve this problem..." }]
});

// After (Opus 4.6 — recommended)
const response = await anthropic.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  // Optional: control depth with effort parameter
  // effort: "max",  // low | medium | high (default) | max
  messages: [{ role: "user", content: "Solve this problem..." }]
});

Adaptive thinking also automatically enables interleaved thinking, which means Claude can reason between tool calls without the previously required interleaved-thinking-2025-05-14 beta header. If you're still passing that header, it will be safely ignored but should be removed.

Benchmark Performance Analysis

Opus 4.6 leads on several key benchmarks, particularly in agentic coding and economically valuable knowledge work. For a broader competitive landscape, see our Claude vs GPT-5.2 vs Gemini 3 Pro comparison.

Agentic and Coding Benchmarks

Benchmark	Opus 4.6	Opus 4.5	GPT-5.2	Gemini 3 Pro
Terminal-Bench 2.0	65.4%	59.8%	64.7% (Codex CLI)	56.2%
SWE-bench Verified	80.8%	80.9%	80.0%	76.2%
OSWorld	72.7%	66.3%	—	—
BrowseComp	84.0%	67.8%	77.9% (Pro)	59.2%
Finance Agent	60.7%	55.9%	56.6%	44.1%
OpenRCA	34.9%	26.9%	—	—

Reasoning and Knowledge Benchmarks

Benchmark	Opus 4.6	Opus 4.5	GPT-5.2	Gemini 3 Pro
GDPval-AA (Elo)	1606	1416	1462	1195
HLE (with tools)	53.1%	43.4%	50.0% (Pro)	45.8%
ARC AGI 2	68.8%	37.6%	54.2% (Pro)	45.1%
GPQA Diamond	91.3%	87.0%	93.2% (Pro)	91.9%
BigLaw Bench	90.2%	—	—	—
MMMLU	91.1%	90.8%	89.6%	91.8%

A notable result: Opus 4.5 slightly edges out Opus 4.6 on SWE-bench Verified (80.9% vs 80.8%), while GPT-5.2 with Codex CLI reaches 64.7% on Terminal-Bench 2.0 — just 0.7 percentage points behind Opus 4.6. The ARC AGI 2 result is particularly striking, with Opus 4.6 nearly doubling Opus 4.5's score (68.8% vs 37.6%).

Notable Domain-Specific Gains

Software Failure Diagnosis: 34.9% on OpenRCA vs Opus 4.5's 26.9% and Sonnet 4.5's 12.9% — a 30% improvement over the previous generation
Life Sciences: Nearly 2x improvement over Opus 4.5 in computational biology, structural biology, organic chemistry, and phylogenetics
Novel Problem-Solving: 68.8% on ARC AGI 2, nearly doubling Opus 4.5's 37.6% and exceeding GPT-5.2 Pro's 54.2%
Long-term Coherence: Earns $3,050.53 more than Opus 4.5 on Vending-Bench 2

1M Context Window and Compaction API

The 1M token context window is the headline expansion for Opus 4.6. Currently in beta for organizations in usage tier 4 or those with custom rate limits, it represents a 5x increase over the standard 200K window.

Long-Context Retrieval Performance

On the MRCR v2 benchmark (8-needle), Opus 4.6 scores 93% at 256K context and 76% at 1M context. For comparison, Sonnet 4.5 manages only 10.8% at 256K and 18.5% at 1M — making Opus 4.6 roughly 4-9x more reliable at retrieving information from deep in long contexts. Anthropic describes this as a qualitative shift in reducing context rot.

Pricing note: Requests exceeding 200K input tokens are charged at premium rates ($10 / $37.50 per MTok). Requests under 200K use standard pricing even with the beta flag enabled.

Compaction API for Infinite Conversations

The Compaction API (beta) provides server-side context summarization. When a conversation approaches the context window limit, the API automatically summarizes older parts of the conversation. This enables effectively infinite conversations without manual context management, sliding window hacks, or truncation strategies.

For agent-based workflows that involve many tool calls and long chains of reasoning, compaction can significantly reduce the overhead of maintaining conversation state. The summarization happens server-side, so there are no additional API calls required from your application.

Pricing and Availability

Opus 4.6 maintains the same base pricing as Opus 4.5, making it a direct upgrade with no cost increase for existing workloads under 200K tokens.

Tier	Input	Output	Notes
Standard	$5 / MTok	$25 / MTok	Up to 200K context
Long Context (200K+)	$10 / MTok	$37.50 / MTok	1M beta, all tokens at premium rate
US-Only Inference	1.1x standard	1.1x standard	Via inference_geo parameter
Batch Processing	$2.50 / MTok	$12.50 / MTok	50% discount, asynchronous

Platform Availability

Anthropic API: Available now with model ID claude-opus-4-6
AWS Bedrock: Available at launch with regional and global endpoints
Google Vertex AI: Available at launch
Microsoft Foundry: Available at launch
claude.ai: Available to Pro and Team subscribers

API Changes and Migration Guide

Opus 4.6 introduces both deprecations and a notable breaking change. If you're upgrading from Opus 4.5 or Sonnet 4, review these carefully before switching your model ID.

Breaking: Prefilling Disabled

Action required: Assistant message prefilling returns a 400 error on Opus 4.6. If your application relies on prefilling to control response format, you must migrate before switching to the new model.

Prefilling — where you start the assistant's response with specific text like {"role": "assistant", "content": "Here is the JSON:"} — is no longer supported. Alternatives include:

Structured outputs: Use output_config.format with json_schema for guaranteed JSON output
System prompt instructions: Guide response style and format through the system message
JSON output mode: Use output_config.format for general JSON responses

Deprecations

Extended thinking (type: "enabled" + budget_tokens): Still functional but deprecated. Migrate to adaptive thinking.
Interleaved thinking beta header: Safely ignored on Opus 4.6. Remove from requests.
output_format parameter: Moved to output_config.format. Old parameter still works but is deprecated.

New Features

Effort parameter (GA): No longer requires a beta header. Combine with adaptive thinking for cost-quality tradeoffs.
Fine-grained tool streaming (GA): Now generally available on all models and platforms without a beta header.
inference_geo parameter: Request US-only inference with a 1.1x pricing multiplier for data residency compliance.
128K output tokens: Double the previous 64K limit. SDKs require streaming for large max_tokens to avoid HTTP timeouts.

Claude Code and Agent Teams

Alongside the model release, Anthropic launched several product updates that leverage Opus 4.6's improved agentic capabilities. For a deeper look at Claude Code's impact on development workflows, see our Claude Code development guide.

Agent Teams (Research Preview)

Claude Code now supports agent teams — multiple autonomous agents that coordinate in parallel on complex tasks. This enables workflows where one agent handles frontend changes, another tackles backend logic, and a third manages tests, all working simultaneously under supervisory control.

For organizations exploring how Anthropic's enterprise tools fit into larger workflows, our Cowork plugins enterprise guide covers the broader plugin ecosystem.

Office Integrations

Claude in Excel

Enhanced for long-running tasks

Pre-planning capability for complex operations
Unstructured data ingestion with auto-structuring

Claude in PowerPoint

Research preview

Full-deck generation from descriptions
Preserves design systems, fonts, and slide masters

Building with Claude's agentic capabilities? Explore our AI & Digital Transformation services to integrate advanced AI workflows into your organization.

Enterprise and Safety Features

Opus 4.6 introduces data residency controls and maintains what Anthropic describes as the lowest over-refusal rates among recent Claude models. For industries with strict compliance requirements, these updates address key adoption barriers.

Data Residency Controls

The new inference_geo parameter lets you specify where model inference runs on a per-request basis. Setting it to "us" guarantees US-only processing at a 1.1x pricing multiplier. The default "global" routing uses standard pricing.

Safety Profile

Lowest misalignment score: Opus 4.6 scores approximately 1.8 out of 10 on overall misaligned behavior, the lowest of any Claude model tested — compared to Opus 4.5 (~1.9), Haiku 4.5 (~2.2), Sonnet 4.5 (~2.7), and Opus 4.1 (~4.3)
Lowest over-refusal rates: Among recent Claude models, Opus 4.6 is less likely to refuse legitimate requests while maintaining appropriate safety boundaries
Cybersecurity: Six new probes developed during the safety evaluation process, with top results in 38 of 40 blind-ranked investigations

For legal and compliance teams, the 90.2% BigLaw Bench score is particularly relevant. For a detailed look at Claude's legal capabilities, see our Claude legal plugin analysis.

Opus 4.6 vs the Competition

Here's how Opus 4.6 compares to its closest competitors across key criteria as of February 2026:

Feature	Claude Opus 4.6	GPT-5.2	Gemini 3 Pro
Context Window	200K / 1M (beta)	128K	2M
Max Output	128K tokens	32K tokens	8K tokens
Terminal-Bench 2.0	65.4%	64.7% (Codex CLI)	56.2%
SWE-bench Verified	80.8%	80.0%	76.2%
GDPval-AA (Elo)	1606	1462	1195
HLE (with tools)	53.1%	50.0% (Pro)	45.8%
GPQA Diamond	91.3%	93.2% (Pro)	91.9%
Input Pricing	$5 / MTok	Varies by tier	Varies by tier
Output Pricing	$25 / MTok	Varies by tier	Varies by tier
Thinking Mode	Adaptive (4 levels)	Chain-of-thought	Deep Think mode

Comparison date: February 2026. AI model capabilities and pricing evolve rapidly. Verify current specifications before making procurement decisions.

Opus 4.6 leads in knowledge work (GDPval-AA), agentic search (BrowseComp), and legal reasoning (BigLaw). GPT-5.2 edges ahead on graduate-level reasoning (GPQA Diamond) and is within 0.7 points on Terminal-Bench 2.0. Gemini 3 Pro retains the largest native context window at 2M tokens and leads on visual reasoning (MMMU Pro). The right model depends on your specific use case, latency requirements, and budget.

What This Means for Your AI Strategy

Claude Opus 4.6 represents a meaningful step forward in reasoning depth, context handling, and agentic task execution. The adaptive thinking mode simplifies the developer experience, the compaction API opens the door to persistent agent conversations, and the benchmark results suggest Anthropic is pulling ahead in coding and knowledge work tasks.

The most immediate action item for existing Claude API users is the prefilling removal — check your codebase for assistant message prefills and migrate to structured outputs before switching to the new model ID. For teams evaluating Claude for the first time, the pricing parity with Opus 4.5 means there's no cost reason to start with the older model.