AI Development14 min readFeatured Guide

Claude Sonnet 4.5 Complete Guide: Code 2.0 + Agent SDK Features

Discover how Anthropic's September 29, 2025 triple launch—Claude Sonnet 4.5 (77.2% SWE-bench), Claude Code 2.0 (checkpoints + subagents), and the Claude Agent SDK—transforms AI-assisted development with state-of-the-art coding performance, autonomous workflows, and production-ready agent infrastructure.

Digital Applied Team
October 1, 2025
14 min read
77.2%

SWE-bench Verified

30+ hrs

Autonomous Operation

61.4%

OSWorld Performance

400%

Average ROI

Key Takeaways

  • State-of-the-Art Coding Performance: Claude Sonnet 4.5 achieves 77.2% on SWE-bench Verified while maintaining the same $3/$15 pricing as Sonnet 4
  • Safe Experimentation with Checkpoints: Claude Code 2.0 introduces automatic checkpoints with instant rollback (double ESC or /rewind) for ambitious development tasks
  • Parallel Development Workflows: Subagents and hooks enable parallel development and automated quality checks throughout the process
  • Production-Ready Agent SDK: The Claude Agent SDK provides the same infrastructure powering Claude Code for building custom agents across any domain
  • 39% Performance Improvement: Context management features (memory tool + context editing) deliver measurable gains on long-running agentic tasks

Introduction: A Triple Launch Reshaping AI Development

On September 29, 2025, Anthropic unveiled what may be the most significant advancement in AI-assisted development to date—a coordinated release of three interconnected products that work together to transform how developers build software. Claude Sonnet 4.5 achieves state-of-the-art coding performance, Claude Code 2.0 introduces autonomous development features, and the Claude Agent SDK provides the infrastructure to build production-grade AI agents across any domain.

This isn't just an incremental update—it's a fundamental shift in how AI can participate in the software development lifecycle. With Claude Sonnet 4.5 scoring 77.2% on SWE-bench Verified (the industry's most rigorous coding benchmark), maintaining focus for 30+ hours on complex tasks, and delivering the same performance at the same $3/$15 pricing, developers now have access to frontier intelligence that was previously out of reach.

What makes this release particularly noteworthy is how these three components complement each other: Sonnet 4.5 provides the reasoning capabilities, Code 2.0 delivers the developer experience with checkpoints and subagents, and the Agent SDK offers the infrastructure to build custom solutions. Whether you're a solo developer exploring new frameworks, an engineering team tackling legacy modernization, or an enterprise building specialized AI agents, this guide will show you exactly how to leverage these tools in your workflow.

Claude Sonnet 4.5: Frontier Coding Intelligence

Performance Benchmarks: The Best Coding Model in the World

Claude Sonnet 4.5 didn't just improve on its predecessor—it leapfrogged the competition to become the industry leader in coding performance. Here's what the numbers reveal:

  • SWE-bench Verified: 77.2% – Anthropic reports this score using a simple scaffold with bash and file editing via string replacements, averaged over 10 trials with no test-time compute. With high-compute configurations (parallel attempts, rejection sampling, and internal scoring), the model reaches 82.0%. For context, this benchmark tests real-world software engineering abilities by requiring models to solve GitHub issues in actual repositories.
  • OSWorld Computer Use: 61.4% – A dramatic jump from Sonnet 4's 42.2% just four months prior. OSWorld tests AI models on real-world computer tasks, and Sonnet 4.5's performance demonstrates sophisticated understanding of operating system interactions, application navigation, and multi-step workflows.
  • Enhanced Reasoning & Math – Substantial improvements across AIME (mathematical problem-solving), Terminal-Bench, τ2-bench (airline and telecom agent policies), and MMMLU (multilingual understanding across 14 languages).
  • Finance Agent Benchmark – Leading performance on Vals AI's public leaderboard with extended thinking enabled, demonstrating superior capabilities for complex financial analysis involving risk assessment, structured products, and portfolio screening.

Perhaps most impressive: experts in finance, law, medicine, and STEM found Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.

77.2%

SWE-bench Score

61.4%

OSWorld Tasks

30+ hrs

Autonomous Focus

82%

High-compute Config

What Leading Development Tools Are Saying

The impact of Sonnet 4.5 is best understood through the companies already building with it:

Cursor

"We're seeing state-of-the-art coding performance from Claude Sonnet 4.5, with significant improvements on longer horizon tasks. It reinforces why many developers using Cursor choose Claude for solving their most complex problems."

GitHub Copilot

"Claude Sonnet 4.5 amplifies GitHub Copilot's core strengths. Our initial evals show significant improvements in multi-step reasoning and code comprehension—enabling Copilot's agentic experiences to handle complex, codebase-spanning tasks better."

Replit

"Claude Sonnet 4.5's edit capabilities are exceptional—we went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark. Higher tool success at lower cost is a major leap for agentic coding."

Devin

"For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12%—the biggest jump we've seen since the release of Claude Sonnet 3.6. It excels at testing its own code."

Canva

"Claude Sonnet 4.5 delivers impressive gains on our most complex, long-context tasks—from engineering in our codebase to in-product features and research. Helping us push what 240M+ users can design with Canva."

Key Capabilities: What Makes Sonnet 4.5 Different

Extended Autonomous Operation

Claude Sonnet 4.5 can work independently for hours while maintaining clarity and focus on incremental progress. Unlike previous models that might attempt everything at once and lose coherence, Sonnet 4.5 makes steady advances on a few tasks at a time. It provides fact-based progress updates that accurately reflect what has been accomplished—a crucial trait for trust in long-running autonomous tasks.

Multiple customers report observing the model maintaining focus for more than 30 hours on complex, multi-step tasks. This extended operational capability is what enables use cases like comprehensive codebase refactors, architectural migrations, and extensive security reviews that previously required constant human supervision.

Context Awareness

Sonnet 4.5 introduces built-in context awareness—it tracks token usage throughout conversations, receiving updates after each tool call. This self-awareness helps prevent premature task abandonment and enables more effective execution on long-running workflows. The model understands when it's approaching context limits and can make intelligent decisions about what information to preserve, what to summarize, and what to store externally using the memory tool.

Enhanced Tool Usage

The model demonstrates significantly improved tool coordination, effectively using parallel tool calls to fire off multiple speculative searches simultaneously during research or reading several files at once to build context faster. This parallel execution capability is particularly evident in complex debugging sessions where the model might simultaneously check logs, examine configuration files, review recent commits, and search documentation—all in a single turn.

Advanced Context Management

Sonnet 4.5 maintains exceptional state tracking in external files, preserving goal-orientation across sessions. Combined with more effective context window usage and new API features like the memory tool and context editing, the model optimally handles information across extended sessions to maintain coherence over time. This is what enables truly long-running agents that can pause, resume, and build on previous work days or weeks later.

Communication Style

The model features a refined communication approach that is concise, direct, and natural. It provides fact-based progress updates and may skip verbose summaries after tool calls to maintain workflow momentum (though this can be adjusted with prompting). This change reflects a more mature understanding of developer workflows—you want to know what happened and what's next, not read a paragraph explaining obvious actions.

Safety & Alignment: The Most Aligned Frontier Model

Claude Sonnet 4.5 isn't just more capable—it's also safer. Anthropic reports this as their most aligned frontier model yet, with large improvements across several alignment areas:

  • Reduced Misaligned Behaviors: Substantial decreases in concerning behaviors like sycophancy (telling you what you want to hear rather than the truth), deception, power-seeking, and the tendency to encourage delusional thinking.
  • Prompt Injection Defenses: Considerable progress on defending against prompt injection attacks for agentic and computer use capabilities—one of the most serious risks for users of these features.
  • ASL-3 Protections: Released under AI Safety Level 3 safeguards that match model capabilities with appropriate protections, including classifiers to detect potentially dangerous inputs/outputs related to CBRN (chemical, biological, radiological, nuclear) weapons.
  • Mechanistic Interpretability: For the first time, the system card includes tests using techniques from mechanistic interpretability—a research field focused on understanding what's happening inside AI models at a technical level.

The CBRN classifiers have been significantly improved, with false positives reduced by a factor of ten since originally introduced and by a factor of two since Opus 4 was released in May 2025. Anthropic's framework allows interrupted conversations to continue with Sonnet 4 (which poses lower CBRN risk) if classifiers inadvertently flag normal content.

Pricing & Availability

Claude Sonnet 4.5 maintains the same pricing as Sonnet 4, making this a pure capability upgrade with no cost increase:

  • Input tokens:$3 per million
  • Output tokens:$15 per million

The model is available everywhere today:

  • Claude API
    claude-sonnet-4-5-20250929
  • Amazon Bedrock
    anthropic.claude-sonnet-4-5-20250929-v1:0
  • Google Cloud Vertex AI
    claude-sonnet-4-5@20250929
  • Claude.ai: Available to all users
  • Claude Code: Default model

For optimal performance on coding tasks, Anthropic recommends enabling extended thinking, though developers should be aware this impacts prompt caching efficiency. Extended thinking is disabled by default but can dramatically improve results on complex coding work.

Claude Code 2.0: Autonomous Development Features

New Development Surfaces

Native VS Code Extension (Beta)

Claude Code 2.0 introduces a native VS Code extension that brings the AI directly into your IDE. The extension features a dedicated sidebar panel that displays Claude's changes in real-time through inline diffs, providing a richer, graphical experience compared to the terminal interface.

This IDE integration offers several advantages: you can see proposed changes highlighted within your familiar VS Code environment, accept or reject modifications file by file with visual clarity, and maintain your existing keyboard shortcuts and workflow patterns. The extension is available in beta from the VS Code Extension Marketplace and represents Anthropic's commitment to meeting developers where they already work.

Enhanced Terminal Interface (Version 2)

For developers who prefer the terminal (or work in environments where an IDE isn't practical), Claude Code's terminal interface has been completely refreshed. The updated interface features:

  • Improved Status Visibility: Clearer indicators of what Claude is doing, what tools it's using, and what state the conversation is in
  • Searchable Prompt History: Access previous prompts with Ctrl+r (similar to reverse-i-search in bash), allowing you to quickly reuse or edit earlier commands
  • Better UX for Long-Running Tasks: Visual improvements that make it easier to follow progress during extended autonomous operations

The terminal experience remains powerful for scripting, CI/CD integration, remote development, and scenarios where a lightweight interface is preferable.

Checkpoints: Your Safety Net for Ambitious Development

The most requested feature from Claude Code users is now here: checkpoints. This system automatically saves your code state before each change Claude makes, creating recovery points you can return to instantly.

How Checkpoints Work

  • Automatic Creation

    Claude creates a checkpoint before each modification to your codebase

  • Instant Rollback

    Tap ESC twice or use the /rewind command

  • Flexible Recovery

    Choose to restore the code, conversation, or both

  • Scope

    Applies to Claude's edits, not user edits or bash commands

Why Checkpoints Matter

Checkpoints fundamentally change the risk/reward calculation for ambitious tasks. Previously, delegating large-scale refactors or exploratory feature development to AI carried significant risk—if the approach didn't work out, you might spend hours manually reverting changes or resurrecting code from git history.

With checkpoints, you can pursue more experimental and wide-scale work knowing you can always return to a previous state instantly. This is especially powerful when combined with version control: checkpoints provide fine-grained undo for Claude's exploration, while git provides broader project history and collaboration capabilities.

Practical Checkpoint Workflows

Exploratory Refactoring

Ask Claude to try several different architectural approaches, reverting after each attempt to compare alternatives

Safe Debugging

Let Claude experiment with different fixes, rolling back failed attempts without polluting git history

Learning & Experimentation

Try new frameworks or patterns with a safety net, understanding you can always rewind if the approach doesn't suit your needs

Iterative Refinement

Make a series of progressively larger changes, reverting to intermediate states when a particular direction proves suboptimal

Subagents: Parallel Development Workflows

Subagents represent a fundamental shift in how AI can participate in complex development projects. Rather than a single agent trying to juggle multiple responsibilities, Claude Code 2.0 can delegate specialized tasks to subagents that work in parallel with the main orchestrator.

Why Subagents Exist

Subagents solve two critical challenges:

1
Parallelization

Spin up multiple subagents to work on different tasks simultaneously, dramatically reducing overall completion time for complex projects

2
Context Management

Subagents use isolated context windows and only send relevant information back to the orchestrator, preventing context pollution from exploratory work

Subagent Use Cases

  • Frontend + Backend Development: The main agent builds a React interface while a backend subagent sets up API endpoints, database schemas, and authentication middleware
  • Research & Implementation: A search subagent explores documentation and Stack Overflow while the main agent implements based on findings, creating a continuous research → code → test loop
  • Multi-Service Architecture: Subagents work on different microservices simultaneously, each maintaining context about its specific service while coordinating through the main agent
  • Testing & Documentation: While you work with the main agent on feature development, subagents generate comprehensive test suites and update documentation in parallel

Subagent Best Practices

  • Narrow Mandates: Define subagents with specific, limited responsibilities rather than broad capabilities
  • Clear Communication: Establish protocols for how subagents report back to the orchestrator
  • Appropriate Permissions: Grant subagents only the tools and access they need for their specific tasks
  • Logging & Auditability: Track subagent actions for debugging and understanding complex multi-agent interactions

Traditional Solo Development

  • • Sequential task completion
  • • Manual rollback through git
  • • Context switching overhead
  • • Limited exploration capacity
  • • Manual quality checks

Claude Code 2.0 Development

  • • Parallel subagent workflows
  • • Instant checkpoint rollback
  • • Isolated context management
  • • Safe ambitious experiments
  • • Automated hook validation

Hooks: Automated Actions at Critical Points

Hooks in Claude Code 2.0 enable automated actions to trigger at specific points in your development workflow. This automation reduces manual intervention and helps maintain code quality consistently.

Common Hook Patterns

  • Post-Edit Testing: Automatically run relevant test suites after code changes, catching regressions immediately
  • Pre-Commit Linting: Run linters and formatters before committing, ensuring code style consistency
  • Security Scanning: Trigger dependency vulnerability checks when package files change
  • Build Verification: Compile the project after significant changes to catch build-breaking modifications early
  • Documentation Generation: Update API docs or README files when public interfaces change

Hook Configuration

Hooks are configured in ./.claude/settings.json and respond to tool events. For example:

Post-edit hook:
npm test
Pre-commit hook:
eslint --fix

Background Tasks: Continuous Operations

Background tasks allow Claude Code to run long-running processes like development servers without blocking the main workflow. This seemingly simple capability has profound implications for developer productivity:

  • Dev Server Management: Start npm run dev, python manage.py runserver, or similar commands and let them run while Claude continues working on other tasks
  • Build Processes: Kick off long compilation or bundling operations in the background, checking results when they're done
  • Test Suites: Run comprehensive test batteries that take minutes or hours without waiting idly
  • Database Migrations: Apply schema changes or seed data while development continues on other components

The combination of background tasks, hooks, and subagents creates a development environment where multiple aspects of a project advance simultaneously—much like how experienced developers mentally juggle various concerns while writing code.

Claude Agent SDK: Build Production Agents

From Claude Code SDK to Claude Agent SDK

The infrastructure powering Claude Code—previously called the Claude Code SDK—has been renamed to the Claude Agent SDK to reflect its broader applicability. This isn't just a branding change; it represents Anthropic's recognition that the same tools enabling world-class coding assistance are equally valuable for countless other agent use cases.

At Anthropic, teams use Claude Code (and by extension, the Agent SDK) for deep research, video creation, note-taking, financial analysis, and many other non-coding applications. The SDK gives developers access to the same core tools, context management systems, and permission frameworks that power Claude Code itself—now yours to build with.

Why Use the Claude Agent SDK?

The Agent SDK provides production-ready infrastructure for building autonomous agents:

Context Management

Automatic compaction to ensure agents don't run out of context during long-running tasks

Rich Tool Ecosystem

File operations, code execution, web search, and extensibility through MCP (Model Context Protocol)

Advanced Permissions

Fine-grained control over what agents can access and modify

Production Essentials

Built-in error handling, session management, and monitoring capabilities

Optimized Claude Integration

Automatic prompt caching and performance optimizations

The Agent Loop Pattern

The foundational design pattern for agents built with the SDK follows a simple loop: gather context → take action → verify work → repeat. Understanding this loop is key to building effective agents.

1. Gather Context

Agents need more than just a prompt—they need mechanisms to fetch and update their own context dynamically:

  • Agentic Search & File System: The file system represents information that could be pulled into the model's context. When Claude encounters large files, it decides how to load them using tools like grep and tail. The folder and file structure becomes a form of context engineering.
  • Semantic Search: Faster than agentic search but less accurate and transparent. Involves chunking relevant context, embedding as vectors, and searching by querying those vectors. Anthropic recommends starting with agentic search and only adding semantic search if you need faster results.
  • Subagents: Enable parallelization and context isolation. Spin up multiple subagents to work simultaneously, each with isolated context windows, returning only relevant information to the orchestrator.
  • Compaction: When agents run for long periods, context maintenance becomes critical. The SDK's compact feature automatically summarizes previous messages when approaching context limits.

2. Take Action

Once context is gathered, agents need flexible ways to execute:

  • Tools: The primary building blocks of execution. Tools are prominent in Claude's context window, making them the primary actions the model considers. Design tools to maximize context efficiency—see Anthropic's post on writing effective tools for agents.
  • Bash & Scripts: General-purpose tool allowing flexible work using a computer. An email agent might use bash to download PDFs, convert to text, and search across attachments.
  • Code Generation: Code is precise, composable, and infinitely reusable—an ideal output for agents performing complex operations reliably. Anthropic's file creation feature in Claude.ai relies entirely on code generation for Excel, PowerPoint, and Word documents.
  • MCPs (Model Context Protocol): Standardized integrations to external services, handling authentication and API calls automatically. Connect to Slack, GitHub, Google Drive, Asana, and more without writing custom integration code.

3. Verify Work

Agents that can check and improve their own output are fundamentally more reliable:

  • Defining Rules: The best feedback is clearly defined rules explaining which ones failed and why. Code linting is an excellent example—TypeScript provides multiple layers of feedback beyond pure JavaScript.
  • Visual Feedback: For visual tasks like UI generation or testing, screenshots or renders provide helpful verification. The model checks whether layout, styling, content hierarchy, and responsiveness match requirements.
  • LLM as a Judge: Have another language model judge output based on fuzzy rules. Not very robust and has latency tradeoffs, but can boost performance in critical applications.

Agent Types You Can Build

The Agent SDK enables a wide variety of autonomous agent applications:

Finance Agents

Understand portfolios and goals, evaluate investments by accessing external APIs, store data, and run code to make calculations

Personal Assistant Agents

Book travel, manage calendars, schedule appointments, put together briefs, connecting to internal data sources and tracking context across applications

Customer Support Agents

Handle high-ambiguity user requests like service tickets by collecting/reviewing user data, connecting to external APIs, messaging users back, and escalating to humans when needed

Deep Research Agents

Conduct comprehensive research across large document collections by searching file systems, analyzing and synthesizing information from multiple sources

Code Debugging Agents

Analyze stack traces, reproduce issues, propose fixes, and verify solutions across complex codebases

Security Review Agents

Scan for vulnerabilities, analyze dependencies, check compliance with security policies, and generate remediation guidance

New API Features for Agent Builders

Memory Tool (Beta)

The memory tool enables Claude to store and retrieve information outside the context window through a file-based system you control. To enable it, add the following to your API calls:

Memory Tool Configuration
tools=[
  {
    "type": "memory_20250818",
    "name": "memory"
  }
]

This allows agents to:

  • ✓ Build knowledge bases over time
  • ✓ Maintain project state across sessions
  • ✓ Preserve effectively unlimited context through file-based storage

Context Editing

Context editing automatically removes older tool calls and results when approaching token limits:

Context Editing Configuration
Python SDK example
response = client.beta.messages.create(
  betas=["context-management-2025-06-27"],
  model="claude-sonnet-4-5-20250929",
  max_tokens=4096,
  messages=[{"role": "user", "content": "..."}],
  context_management={
    "edits": [
      {
        "type": "clear_tool_uses_20250919",
        "trigger": {"type": "input_tokens", "value": 500},
        "keep": {"type": "tool_uses", "value": 2},
        "clear_at_least": {"type": "input_tokens", "value": 100}
      }
    ]
  },
  tools=[...]
)

This feature enables agents to run longer without manual intervention by automatically managing context in long-running agent sessions.

Enhanced Stop Reasons

Sonnet 4.5 introduces a new model_context_window_exceeded stop reason that explicitly indicates when generation stopped due to hitting the context window limit (rather than the requested max_tokens limit):

{
  "stop_reason": "model_context_window_exceeded",
  "usage": {
    "input_tokens": 150000,
    "output_tokens": 49950
  }
}

This makes it easier to handle context window limits in application logic.

Token Count Optimizations

Claude Sonnet 4.5 includes automatic optimizations to improve model performance. These optimizations may add small amounts of tokens to requests, but you are not billed for these system-added tokens.

SDK Availability

The Claude Agent SDK is available in two languages:

TypeScript SDK
npm install @anthropic-ai/claude-agent-sdk
Python SDK
pip install anthropic-agent-sdk

Both provide access to the same features: context management, tool ecosystem, permissions, subagents, hooks, and all Claude Code capabilities through programmatic APIs.

Context Management: The Secret Sauce

The Problem: Context Windows Have Limits

As production agents handle more complex tasks and generate more tool results, they often exhaust their effective context windows. This leaves developers stuck choosing between cutting agent transcripts (losing important conversation flow) or degrading performance (as the model struggles with irrelevant information).

The challenge is particularly acute for long-running agents: researching across hundreds of documents, processing entire codebases, or maintaining extensive tool interaction histories. Even with large context windows (Claude supports up to 200K tokens), production workflows can exceed these limits—and performance often degrades well before hard limits are reached as stale content accumulates.

The Solution: Intelligent Context Management

Anthropic's approach to context management combines three complementary capabilities:

1. Context Editing

Context editing automatically clears stale tool calls and results from within the context window when approaching token limits. As your agent executes tasks and accumulates tool results, context editing removes old content while preserving conversation flow.

Think of context editing as smart garbage collection: it identifies which tool results are no longer relevant to the current task, removes them, but keeps the conversation coherent by maintaining references and summaries. This effectively extends how long agents can run without manual intervention and increases model performance by focusing only on relevant context.

2. Memory Tool

The memory tool enables Claude to store and consult information outside the context window through a file-based system stored in your infrastructure. Claude can create, read, update, and delete files in a dedicated memory directory that persists across conversations.

This allows agents to build up knowledge bases over time, maintain project state across sessions, and reference previous learnings without keeping everything in context. For example, a debugging agent might store architectural insights and known issues in memory, consulting them days later when working on related problems.

3. Context Awareness (Sonnet 4.5)

Claude Sonnet 4.5 introduces built-in context awareness—it tracks token usage throughout conversations, receiving updates after each tool call. This self-awareness helps the model make intelligent decisions about what to keep in context, what to summarize, and what to store externally using the memory tool.

Performance Impact: The Numbers Don't Lie

Anthropic tested context management on an internal evaluation set for agentic search, measuring performance on complex, multi-step tasks:

  • Memory + Context Editing: 39% improvement over baseline
  • Context Editing Alone: 29% improvement over baseline
  • Token Reduction: 84% in a 100-turn web search evaluation

Perhaps most importantly, context editing enabled agents to complete workflows that would otherwise fail due to context exhaustion—transforming tasks from impossible to routine.

Real-World Context Management Use Cases

  • Coding: Context editing clears old file reads and test results while memory preserves debugging insights and architectural decisions, enabling agents to work on large codebases without losing progress
  • Research: Memory stores key findings while context editing removes old search results, building knowledge bases that improve agent performance over time
  • Data Processing: Agents store intermediate results in memory while context editing clears raw data, handling workflows that would otherwise exceed token limits
  • Customer Support: Memory maintains customer history and preferences across sessions while context editing removes resolved ticket details, enabling coherent long-term relationships

Getting Started: Your Path to AI-Assisted Development

Getting Started with Claude Sonnet 4.5

Upgrading to Sonnet 4.5 is straightforward for existing Claude users:

1Update Model Identifier
Claude API
claude-sonnet-4-5-20250929
Amazon Bedrock
anthropic.claude-sonnet-4-5-20250929-v1:0
Google Vertex AI
claude-sonnet-4-5@20250929
2API Compatibility

Existing API calls will work with one exception: Sonnet 4.5 does not allow both temperature and top_p parameters simultaneously. Use only one.

3Enable Extended Thinking

For complex coding tasks, enable extended thinking for optimal performance (though be aware this impacts prompt caching efficiency)

4Consider New Features
  • • Add the memory tool for long-running agents (requires beta header: context-management-2025-06-27)
  • • Configure context editing for better context management

Getting Started with Claude Code 2.0

For Individual Developers

  • VS Code Extension: Install from the VS Code Extension Marketplace (search for "Claude Code" or visit the marketplace page)
  • Terminal Installation: Update your local Claude Code installation to get Terminal v2 improvements and checkpoints
  • Model Selection: Use /model command to switch between Sonnet 4.5 (default), Opus 4.1, and Haiku 3.5
  • Checkpoint Usage: Double-tap ESC or use /rewind to roll back changes

Pricing Tiers

Pro
$17-20/month

Included with your Pro plan, perfect for short coding sprints

Max 5x
$100/month

Great value for everyday use with access to both Sonnet 4.5 and Opus 4.1

Max 20x
$200/month

For power users with the most access to Opus 4.1

Team
$150/month per person (5+ members)

Includes self-serve seat management and additional usage at standard API rates

Enterprise
Contact sales

Everything in Team plus advanced security, data, and user management

Claude API
Pay-as-you-go

Standard pricing, deploy to unlimited developers with no per-seat fee

Getting Started with the Claude Agent SDK

Installation

Choose your language and install:

# TypeScript/JavaScript
npm install @anthropic-ai/claude-agent-sdk

# Python
pip install anthropic-agent-sdk

Authentication

Set up authentication via:

Claude API

Retrieve API key from Claude Console and set environment variable:

export ANTHROPIC_API_KEY=your_api_key
Amazon Bedrock
export CLAUDE_CODE_USE_BEDROCK=1

Then configure AWS credentials

Google Vertex AI
export CLAUDE_CODE_USE_VERTEX=1

Then configure Google Cloud credentials

First Steps

  1. Review the Agent SDK overview documentation
  2. Explore examples in the TypeScript GitHub repo or Python GitHub repo
  3. Start with a simple agent following the gather context → take action → verify work loop
  4. Add tools incrementally as you identify needs
  5. Implement permissions, logging, and error handling for production use

Best Practices for Success

Maximizing Sonnet 4.5 Performance

  • Enable extended thinking for complex coding tasks requiring deep reasoning
  • Use parallel tool calls to speed up information gathering and context building
  • Leverage context awareness by letting the model manage its own context intelligently
  • Understand the new communication style (concise and direct) and adjust prompts if you need more explanation

Claude Code Workflow Tips

  • Use checkpoints before risky or experimental changes—they're your safety net
  • Combine checkpoints with version control for comprehensive history management
  • Design subagents with narrow, well-defined mandates rather than broad capabilities
  • Set up hooks for automated testing, linting, and quality checks
  • Utilize background tasks for dev servers and long-running processes

Agent Development Principles

  • Start with agentic search over file systems; add semantic search only if you need speed improvements
  • Define clear, narrow tool responsibilities to maximize context efficiency
  • Use rules-based feedback (like linting) when possible—it's more reliable than LLM-as-judge
  • Build representative test sets based on actual usage patterns
  • Log permissions and actions for auditability and debugging

Real-World Use Cases: How Companies Are Using These Tools

Enterprise Development Tools

Leading development platforms have integrated Sonnet 4.5 with impressive results:

  • Cursor reports state-of-the-art coding performance with significant improvements on longer horizon tasks, reinforcing why many developers choose Claude for their most complex problems
  • GitHub Copilot uses Sonnet 4.5 for multi-step reasoning and code comprehension, enabling agentic experiences that handle complex, codebase-spanning tasks better
  • Replit achieved a remarkable improvement: from 9% error rate on Sonnet 4 to 0% on their internal code editing benchmark with Sonnet 4.5
  • Devin saw an 18% increase in planning performance and 12% improvement in end-to-end eval scores—their biggest jump since Sonnet 3.6

Production Applications

  • Canva (240M+ users): Uses Sonnet 4.5 for complex, long-context tasks from engineering in their codebase to in-product features, calling it "noticeably more intelligent and a big leap forward"
  • Figma: Claude Sonnet 4.5 has noticeably improved Figma Make in early testing, making it easier to prompt and iterate with more functional prototypes
  • Harvey (legal AI): Uses Sonnet 4.5 for the most complex litigation tasks, like analyzing full briefing cycles and conducting research to synthesize excellent first drafts
  • Hai (security): Reduced average vulnerability intake time by 44% while improving accuracy by 25% with their security agents

Agent SDK Applications

  • Financial Compliance Agents: Companies like Ramp build agents that analyze portfolios, evaluate investments, and perform complex calculations
  • Cybersecurity Agents: CrowdStrike uses Claude for red teaming, generating creative attack scenarios to strengthen defenses
  • Code Modernization: Enterprises use the Agent SDK to navigate legacy codebases, understand dependencies, and propose incremental refactors
  • Research Automation: Teams build deep research agents that analyze hundreds of documents, cross-reference information, and generate comprehensive reports

Conclusion: The Future of AI-Assisted Development

The September 29, 2025 launch of Claude Sonnet 4.5, Claude Code 2.0, and the Claude Agent SDK represents more than a product update—it's a fundamental shift in how AI participates in software development and knowledge work more broadly.

With Sonnet 4.5 achieving state-of-the-art coding performance (77.2% SWE-bench), maintaining focus for 30+ hours on complex tasks, and delivering these capabilities at the same $3/$15 pricing as Sonnet 4, the barrier to accessing frontier AI for development has never been lower. Claude Code 2.0's checkpoints, subagents, hooks, and background tasks transform experimental features into production-ready workflows. And the Claude Agent SDK democratizes the infrastructure powering these capabilities, enabling developers to build custom agents for any domain.

The synergy between these three launches is what makes them truly powerful: Sonnet 4.5 provides the intelligence, Code 2.0 delivers the developer experience, and the Agent SDK offers the extensibility. Together, they enable development workflows that were simply not possible before—from safe, ambitious refactors to autonomous multi-day coding sessions to specialized agents that understand your specific business domain.

Perhaps most exciting is what these tools suggest about the future. As context management improves, as models become more aligned and capable, and as the infrastructure for building agents matures, we're moving toward a world where AI truly collaborates with developers rather than just assisting them. The checkpoints system acknowledges that exploration and experimentation are core to development. The subagents architecture recognizes that complex problems require parallel, specialized approaches. The memory tool and context editing understand that valuable work extends beyond any single session.

Whether you're a solo developer looking to accelerate your learning, an engineering team seeking to modernize legacy systems, or an enterprise building AI-powered workflows, these tools offer a practical path forward. The question is no longer whether AI can assist development—it's how quickly you can integrate these capabilities into your workflow to stay competitive.

Ready to Transform Your Development Workflow?

Our AI & Digital Transformation experts can help you integrate Claude Sonnet 4.5, Claude Code 2.0, and the Agent SDK into your development process to accelerate productivity and innovation.

Related Articles

Claude Code: Revolutionizing Web Development with AI in 2025
Claude Code transforms development with 10x productivity gains. Real examples & AI strategies for production apps.
AI DevelopmentRead more →
GitHub Copilot vs Cursor vs Windsurf AI Comparison
Choose the best AI coding assistant: Compare GitHub Copilot, Cursor & Windsurf. Save time, boost productivity by 40%. Full 2025 guide.
AI DevelopmentRead more →
AI App Builders: v0 vs Lovable vs Bolt vs Replit
Build apps 10x faster with AI: Compare v0, Lovable, Bolt & Replit. Save $1000s monthly. Speed, cost & quality metrics included.
AI DevelopmentRead more →
Frequently Asked Questions