Claude Sonnet 4.5 Complete Guide: Code 2.0 + Agent SDK Features
Discover how Anthropic's September 29, 2025 triple launch—Claude Sonnet 4.5 (77.2% SWE-bench), Claude Code 2.0 (checkpoints + subagents), and the Claude Agent SDK—transforms AI-assisted development with state-of-the-art coding performance, autonomous workflows, and production-ready agent infrastructure.
SWE-bench Verified
Autonomous Operation
OSWorld Performance
Average ROI
Key Takeaways
- State-of-the-Art Coding Performance: Claude Sonnet 4.5 achieves 77.2% on SWE-bench Verified while maintaining the same $3/$15 pricing as Sonnet 4
- Safe Experimentation with Checkpoints: Claude Code 2.0 introduces automatic checkpoints with instant rollback (double ESC or /rewind) for ambitious development tasks
- Parallel Development Workflows: Subagents and hooks enable parallel development and automated quality checks throughout the process
- Production-Ready Agent SDK: The Claude Agent SDK provides the same infrastructure powering Claude Code for building custom agents across any domain
- 39% Performance Improvement: Context management features (memory tool + context editing) deliver measurable gains on long-running agentic tasks
Introduction: A Triple Launch Reshaping AI Development
On September 29, 2025, Anthropic unveiled what may be the most significant advancement in AI-assisted development to date—a coordinated release of three interconnected products that work together to transform how developers build software. Claude Sonnet 4.5 achieves state-of-the-art coding performance, Claude Code 2.0 introduces autonomous development features, and the Claude Agent SDK provides the infrastructure to build production-grade AI agents across any domain.
This isn't just an incremental update—it's a fundamental shift in how AI can participate in the software development lifecycle. With Claude Sonnet 4.5 scoring 77.2% on SWE-bench Verified (the industry's most rigorous coding benchmark), maintaining focus for 30+ hours on complex tasks, and delivering the same performance at the same $3/$15 pricing, developers now have access to frontier intelligence that was previously out of reach.
What makes this release particularly noteworthy is how these three components complement each other: Sonnet 4.5 provides the reasoning capabilities, Code 2.0 delivers the developer experience with checkpoints and subagents, and the Agent SDK offers the infrastructure to build custom solutions. Whether you're a solo developer exploring new frameworks, an engineering team tackling legacy modernization, or an enterprise building specialized AI agents, this guide will show you exactly how to leverage these tools in your workflow.
Claude Sonnet 4.5: Frontier Coding Intelligence
Performance Benchmarks: The Best Coding Model in the World
Claude Sonnet 4.5 didn't just improve on its predecessor—it leapfrogged the competition to become the industry leader in coding performance. Here's what the numbers reveal:
- SWE-bench Verified: 77.2% – Anthropic reports this score using a simple scaffold with bash and file editing via string replacements, averaged over 10 trials with no test-time compute. With high-compute configurations (parallel attempts, rejection sampling, and internal scoring), the model reaches 82.0%. For context, this benchmark tests real-world software engineering abilities by requiring models to solve GitHub issues in actual repositories.
- OSWorld Computer Use: 61.4% – A dramatic jump from Sonnet 4's 42.2% just four months prior. OSWorld tests AI models on real-world computer tasks, and Sonnet 4.5's performance demonstrates sophisticated understanding of operating system interactions, application navigation, and multi-step workflows.
- Enhanced Reasoning & Math – Substantial improvements across AIME (mathematical problem-solving), Terminal-Bench, τ2-bench (airline and telecom agent policies), and MMMLU (multilingual understanding across 14 languages).
- Finance Agent Benchmark – Leading performance on Vals AI's public leaderboard with extended thinking enabled, demonstrating superior capabilities for complex financial analysis involving risk assessment, structured products, and portfolio screening.
Perhaps most impressive: experts in finance, law, medicine, and STEM found Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.
77.2%
SWE-bench Score
61.4%
OSWorld Tasks
30+ hrs
Autonomous Focus
82%
High-compute Config
What Leading Development Tools Are Saying
The impact of Sonnet 4.5 is best understood through the companies already building with it:
Cursor
"We're seeing state-of-the-art coding performance from Claude Sonnet 4.5, with significant improvements on longer horizon tasks. It reinforces why many developers using Cursor choose Claude for solving their most complex problems."
GitHub Copilot
"Claude Sonnet 4.5 amplifies GitHub Copilot's core strengths. Our initial evals show significant improvements in multi-step reasoning and code comprehension—enabling Copilot's agentic experiences to handle complex, codebase-spanning tasks better."
Replit
"Claude Sonnet 4.5's edit capabilities are exceptional—we went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark. Higher tool success at lower cost is a major leap for agentic coding."
Devin
"For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12%—the biggest jump we've seen since the release of Claude Sonnet 3.6. It excels at testing its own code."
Canva
"Claude Sonnet 4.5 delivers impressive gains on our most complex, long-context tasks—from engineering in our codebase to in-product features and research. Helping us push what 240M+ users can design with Canva."
Key Capabilities: What Makes Sonnet 4.5 Different
Extended Autonomous Operation
Claude Sonnet 4.5 can work independently for hours while maintaining clarity and focus on incremental progress. Unlike previous models that might attempt everything at once and lose coherence, Sonnet 4.5 makes steady advances on a few tasks at a time. It provides fact-based progress updates that accurately reflect what has been accomplished—a crucial trait for trust in long-running autonomous tasks.
Multiple customers report observing the model maintaining focus for more than 30 hours on complex, multi-step tasks. This extended operational capability is what enables use cases like comprehensive codebase refactors, architectural migrations, and extensive security reviews that previously required constant human supervision.
Context Awareness
Sonnet 4.5 introduces built-in context awareness—it tracks token usage throughout conversations, receiving updates after each tool call. This self-awareness helps prevent premature task abandonment and enables more effective execution on long-running workflows. The model understands when it's approaching context limits and can make intelligent decisions about what information to preserve, what to summarize, and what to store externally using the memory tool.
Enhanced Tool Usage
The model demonstrates significantly improved tool coordination, effectively using parallel tool calls to fire off multiple speculative searches simultaneously during research or reading several files at once to build context faster. This parallel execution capability is particularly evident in complex debugging sessions where the model might simultaneously check logs, examine configuration files, review recent commits, and search documentation—all in a single turn.
Advanced Context Management
Sonnet 4.5 maintains exceptional state tracking in external files, preserving goal-orientation across sessions. Combined with more effective context window usage and new API features like the memory tool and context editing, the model optimally handles information across extended sessions to maintain coherence over time. This is what enables truly long-running agents that can pause, resume, and build on previous work days or weeks later.
Communication Style
The model features a refined communication approach that is concise, direct, and natural. It provides fact-based progress updates and may skip verbose summaries after tool calls to maintain workflow momentum (though this can be adjusted with prompting). This change reflects a more mature understanding of developer workflows—you want to know what happened and what's next, not read a paragraph explaining obvious actions.
Safety & Alignment: The Most Aligned Frontier Model
Claude Sonnet 4.5 isn't just more capable—it's also safer. Anthropic reports this as their most aligned frontier model yet, with large improvements across several alignment areas:
- Reduced Misaligned Behaviors: Substantial decreases in concerning behaviors like sycophancy (telling you what you want to hear rather than the truth), deception, power-seeking, and the tendency to encourage delusional thinking.
- Prompt Injection Defenses: Considerable progress on defending against prompt injection attacks for agentic and computer use capabilities—one of the most serious risks for users of these features.
- ASL-3 Protections: Released under AI Safety Level 3 safeguards that match model capabilities with appropriate protections, including classifiers to detect potentially dangerous inputs/outputs related to CBRN (chemical, biological, radiological, nuclear) weapons.
- Mechanistic Interpretability: For the first time, the system card includes tests using techniques from mechanistic interpretability—a research field focused on understanding what's happening inside AI models at a technical level.
The CBRN classifiers have been significantly improved, with false positives reduced by a factor of ten since originally introduced and by a factor of two since Opus 4 was released in May 2025. Anthropic's framework allows interrupted conversations to continue with Sonnet 4 (which poses lower CBRN risk) if classifiers inadvertently flag normal content.
Pricing & Availability
Claude Sonnet 4.5 maintains the same pricing as Sonnet 4, making this a pure capability upgrade with no cost increase:
- Input tokens:$3 per million
- Output tokens:$15 per million
The model is available everywhere today:
- Claude API
claude-sonnet-4-5-20250929
- Amazon Bedrock
anthropic.claude-sonnet-4-5-20250929-v1:0
- Google Cloud Vertex AI
claude-sonnet-4-5@20250929
- Claude.ai: Available to all users
- Claude Code: Default model
For optimal performance on coding tasks, Anthropic recommends enabling extended thinking, though developers should be aware this impacts prompt caching efficiency. Extended thinking is disabled by default but can dramatically improve results on complex coding work.
Claude Code 2.0: Autonomous Development Features
New Development Surfaces
Native VS Code Extension (Beta)
Claude Code 2.0 introduces a native VS Code extension that brings the AI directly into your IDE. The extension features a dedicated sidebar panel that displays Claude's changes in real-time through inline diffs, providing a richer, graphical experience compared to the terminal interface.
This IDE integration offers several advantages: you can see proposed changes highlighted within your familiar VS Code environment, accept or reject modifications file by file with visual clarity, and maintain your existing keyboard shortcuts and workflow patterns. The extension is available in beta from the VS Code Extension Marketplace and represents Anthropic's commitment to meeting developers where they already work.
Enhanced Terminal Interface (Version 2)
For developers who prefer the terminal (or work in environments where an IDE isn't practical), Claude Code's terminal interface has been completely refreshed. The updated interface features:
- Improved Status Visibility: Clearer indicators of what Claude is doing, what tools it's using, and what state the conversation is in
- Searchable Prompt History: Access previous prompts with
Ctrl+r
(similar to reverse-i-search in bash), allowing you to quickly reuse or edit earlier commands - Better UX for Long-Running Tasks: Visual improvements that make it easier to follow progress during extended autonomous operations
The terminal experience remains powerful for scripting, CI/CD integration, remote development, and scenarios where a lightweight interface is preferable.
Checkpoints: Your Safety Net for Ambitious Development
The most requested feature from Claude Code users is now here: checkpoints. This system automatically saves your code state before each change Claude makes, creating recovery points you can return to instantly.
How Checkpoints Work
- Automatic Creation
Claude creates a checkpoint before each modification to your codebase
- Instant Rollback
Tap
ESC
twice or use the/rewind
command - Flexible Recovery
Choose to restore the code, conversation, or both
- Scope
Applies to Claude's edits, not user edits or bash commands
Why Checkpoints Matter
Checkpoints fundamentally change the risk/reward calculation for ambitious tasks. Previously, delegating large-scale refactors or exploratory feature development to AI carried significant risk—if the approach didn't work out, you might spend hours manually reverting changes or resurrecting code from git history.
With checkpoints, you can pursue more experimental and wide-scale work knowing you can always return to a previous state instantly. This is especially powerful when combined with version control: checkpoints provide fine-grained undo for Claude's exploration, while git provides broader project history and collaboration capabilities.
Practical Checkpoint Workflows
Exploratory Refactoring
Ask Claude to try several different architectural approaches, reverting after each attempt to compare alternatives
Safe Debugging
Let Claude experiment with different fixes, rolling back failed attempts without polluting git history
Learning & Experimentation
Try new frameworks or patterns with a safety net, understanding you can always rewind if the approach doesn't suit your needs
Iterative Refinement
Make a series of progressively larger changes, reverting to intermediate states when a particular direction proves suboptimal
Subagents: Parallel Development Workflows
Subagents represent a fundamental shift in how AI can participate in complex development projects. Rather than a single agent trying to juggle multiple responsibilities, Claude Code 2.0 can delegate specialized tasks to subagents that work in parallel with the main orchestrator.
Why Subagents Exist
Subagents solve two critical challenges:
Parallelization
Spin up multiple subagents to work on different tasks simultaneously, dramatically reducing overall completion time for complex projects
Context Management
Subagents use isolated context windows and only send relevant information back to the orchestrator, preventing context pollution from exploratory work
Subagent Use Cases
- Frontend + Backend Development: The main agent builds a React interface while a backend subagent sets up API endpoints, database schemas, and authentication middleware
- Research & Implementation: A search subagent explores documentation and Stack Overflow while the main agent implements based on findings, creating a continuous research → code → test loop
- Multi-Service Architecture: Subagents work on different microservices simultaneously, each maintaining context about its specific service while coordinating through the main agent
- Testing & Documentation: While you work with the main agent on feature development, subagents generate comprehensive test suites and update documentation in parallel
Subagent Best Practices
- Narrow Mandates: Define subagents with specific, limited responsibilities rather than broad capabilities
- Clear Communication: Establish protocols for how subagents report back to the orchestrator
- Appropriate Permissions: Grant subagents only the tools and access they need for their specific tasks
- Logging & Auditability: Track subagent actions for debugging and understanding complex multi-agent interactions
Traditional Solo Development
- • Sequential task completion
- • Manual rollback through git
- • Context switching overhead
- • Limited exploration capacity
- • Manual quality checks
Claude Code 2.0 Development
- • Parallel subagent workflows
- • Instant checkpoint rollback
- • Isolated context management
- • Safe ambitious experiments
- • Automated hook validation
Hooks: Automated Actions at Critical Points
Hooks in Claude Code 2.0 enable automated actions to trigger at specific points in your development workflow. This automation reduces manual intervention and helps maintain code quality consistently.
Common Hook Patterns
- Post-Edit Testing: Automatically run relevant test suites after code changes, catching regressions immediately
- Pre-Commit Linting: Run linters and formatters before committing, ensuring code style consistency
- Security Scanning: Trigger dependency vulnerability checks when package files change
- Build Verification: Compile the project after significant changes to catch build-breaking modifications early
- Documentation Generation: Update API docs or README files when public interfaces change
Hook Configuration
Hooks are configured in ./.claude/settings.json
and respond to tool events. For example:
npm test
eslint --fix
Background Tasks: Continuous Operations
Background tasks allow Claude Code to run long-running processes like development servers without blocking the main workflow. This seemingly simple capability has profound implications for developer productivity:
- Dev Server Management: Start
npm run dev
,python manage.py runserver
, or similar commands and let them run while Claude continues working on other tasks - Build Processes: Kick off long compilation or bundling operations in the background, checking results when they're done
- Test Suites: Run comprehensive test batteries that take minutes or hours without waiting idly
- Database Migrations: Apply schema changes or seed data while development continues on other components
The combination of background tasks, hooks, and subagents creates a development environment where multiple aspects of a project advance simultaneously—much like how experienced developers mentally juggle various concerns while writing code.
Claude Agent SDK: Build Production Agents
From Claude Code SDK to Claude Agent SDK
The infrastructure powering Claude Code—previously called the Claude Code SDK—has been renamed to the Claude Agent SDK to reflect its broader applicability. This isn't just a branding change; it represents Anthropic's recognition that the same tools enabling world-class coding assistance are equally valuable for countless other agent use cases.
At Anthropic, teams use Claude Code (and by extension, the Agent SDK) for deep research, video creation, note-taking, financial analysis, and many other non-coding applications. The SDK gives developers access to the same core tools, context management systems, and permission frameworks that power Claude Code itself—now yours to build with.
Why Use the Claude Agent SDK?
The Agent SDK provides production-ready infrastructure for building autonomous agents:
Context Management
Automatic compaction to ensure agents don't run out of context during long-running tasks
Rich Tool Ecosystem
File operations, code execution, web search, and extensibility through MCP (Model Context Protocol)
Advanced Permissions
Fine-grained control over what agents can access and modify
Production Essentials
Built-in error handling, session management, and monitoring capabilities
Optimized Claude Integration
Automatic prompt caching and performance optimizations
The Agent Loop Pattern
The foundational design pattern for agents built with the SDK follows a simple loop: gather context → take action → verify work → repeat. Understanding this loop is key to building effective agents.
1. Gather Context
Agents need more than just a prompt—they need mechanisms to fetch and update their own context dynamically:
- Agentic Search & File System: The file system represents information that could be pulled into the model's context. When Claude encounters large files, it decides how to load them using tools like
grep
andtail
. The folder and file structure becomes a form of context engineering. - Semantic Search: Faster than agentic search but less accurate and transparent. Involves chunking relevant context, embedding as vectors, and searching by querying those vectors. Anthropic recommends starting with agentic search and only adding semantic search if you need faster results.
- Subagents: Enable parallelization and context isolation. Spin up multiple subagents to work simultaneously, each with isolated context windows, returning only relevant information to the orchestrator.
- Compaction: When agents run for long periods, context maintenance becomes critical. The SDK's compact feature automatically summarizes previous messages when approaching context limits.
2. Take Action
Once context is gathered, agents need flexible ways to execute:
- Tools: The primary building blocks of execution. Tools are prominent in Claude's context window, making them the primary actions the model considers. Design tools to maximize context efficiency—see Anthropic's post on writing effective tools for agents.
- Bash & Scripts: General-purpose tool allowing flexible work using a computer. An email agent might use bash to download PDFs, convert to text, and search across attachments.
- Code Generation: Code is precise, composable, and infinitely reusable—an ideal output for agents performing complex operations reliably. Anthropic's file creation feature in Claude.ai relies entirely on code generation for Excel, PowerPoint, and Word documents.
- MCPs (Model Context Protocol): Standardized integrations to external services, handling authentication and API calls automatically. Connect to Slack, GitHub, Google Drive, Asana, and more without writing custom integration code.
3. Verify Work
Agents that can check and improve their own output are fundamentally more reliable:
- Defining Rules: The best feedback is clearly defined rules explaining which ones failed and why. Code linting is an excellent example—TypeScript provides multiple layers of feedback beyond pure JavaScript.
- Visual Feedback: For visual tasks like UI generation or testing, screenshots or renders provide helpful verification. The model checks whether layout, styling, content hierarchy, and responsiveness match requirements.
- LLM as a Judge: Have another language model judge output based on fuzzy rules. Not very robust and has latency tradeoffs, but can boost performance in critical applications.
Agent Types You Can Build
The Agent SDK enables a wide variety of autonomous agent applications:
Understand portfolios and goals, evaluate investments by accessing external APIs, store data, and run code to make calculations
Book travel, manage calendars, schedule appointments, put together briefs, connecting to internal data sources and tracking context across applications
Handle high-ambiguity user requests like service tickets by collecting/reviewing user data, connecting to external APIs, messaging users back, and escalating to humans when needed
Conduct comprehensive research across large document collections by searching file systems, analyzing and synthesizing information from multiple sources
Analyze stack traces, reproduce issues, propose fixes, and verify solutions across complex codebases
Scan for vulnerabilities, analyze dependencies, check compliance with security policies, and generate remediation guidance
New API Features for Agent Builders
Memory Tool (Beta)
The memory tool enables Claude to store and retrieve information outside the context window through a file-based system you control. To enable it, add the following to your API calls:
tools=[
{
"type": "memory_20250818",
"name": "memory"
}
]
This allows agents to:
- ✓ Build knowledge bases over time
- ✓ Maintain project state across sessions
- ✓ Preserve effectively unlimited context through file-based storage
context-management-2025-06-27
Context Editing
Context editing automatically removes older tool calls and results when approaching token limits:
response = client.beta.messages.create(
betas=["context-management-2025-06-27"],
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
messages=[{"role": "user", "content": "..."}],
context_management={
"edits": [
{
"type": "clear_tool_uses_20250919",
"trigger": {"type": "input_tokens", "value": 500},
"keep": {"type": "tool_uses", "value": 2},
"clear_at_least": {"type": "input_tokens", "value": 100}
}
]
},
tools=[...]
)
This feature enables agents to run longer without manual intervention by automatically managing context in long-running agent sessions.
Enhanced Stop Reasons
Sonnet 4.5 introduces a new model_context_window_exceeded
stop reason that explicitly indicates when generation stopped due to hitting the context window limit (rather than the requested max_tokens
limit):
{
"stop_reason": "model_context_window_exceeded",
"usage": {
"input_tokens": 150000,
"output_tokens": 49950
}
}
This makes it easier to handle context window limits in application logic.
Token Count Optimizations
Claude Sonnet 4.5 includes automatic optimizations to improve model performance. These optimizations may add small amounts of tokens to requests, but you are not billed for these system-added tokens.
SDK Availability
The Claude Agent SDK is available in two languages:
npm install @anthropic-ai/claude-agent-sdk
pip install anthropic-agent-sdk
Both provide access to the same features: context management, tool ecosystem, permissions, subagents, hooks, and all Claude Code capabilities through programmatic APIs.
Context Management: The Secret Sauce
The Problem: Context Windows Have Limits
As production agents handle more complex tasks and generate more tool results, they often exhaust their effective context windows. This leaves developers stuck choosing between cutting agent transcripts (losing important conversation flow) or degrading performance (as the model struggles with irrelevant information).
The challenge is particularly acute for long-running agents: researching across hundreds of documents, processing entire codebases, or maintaining extensive tool interaction histories. Even with large context windows (Claude supports up to 200K tokens), production workflows can exceed these limits—and performance often degrades well before hard limits are reached as stale content accumulates.
The Solution: Intelligent Context Management
Anthropic's approach to context management combines three complementary capabilities:
1. Context Editing
Context editing automatically clears stale tool calls and results from within the context window when approaching token limits. As your agent executes tasks and accumulates tool results, context editing removes old content while preserving conversation flow.
Think of context editing as smart garbage collection: it identifies which tool results are no longer relevant to the current task, removes them, but keeps the conversation coherent by maintaining references and summaries. This effectively extends how long agents can run without manual intervention and increases model performance by focusing only on relevant context.
2. Memory Tool
The memory tool enables Claude to store and consult information outside the context window through a file-based system stored in your infrastructure. Claude can create, read, update, and delete files in a dedicated memory directory that persists across conversations.
This allows agents to build up knowledge bases over time, maintain project state across sessions, and reference previous learnings without keeping everything in context. For example, a debugging agent might store architectural insights and known issues in memory, consulting them days later when working on related problems.
3. Context Awareness (Sonnet 4.5)
Claude Sonnet 4.5 introduces built-in context awareness—it tracks token usage throughout conversations, receiving updates after each tool call. This self-awareness helps the model make intelligent decisions about what to keep in context, what to summarize, and what to store externally using the memory tool.
Performance Impact: The Numbers Don't Lie
Anthropic tested context management on an internal evaluation set for agentic search, measuring performance on complex, multi-step tasks:
- Memory + Context Editing: 39% improvement over baseline
- Context Editing Alone: 29% improvement over baseline
- Token Reduction: 84% in a 100-turn web search evaluation
Perhaps most importantly, context editing enabled agents to complete workflows that would otherwise fail due to context exhaustion—transforming tasks from impossible to routine.
Real-World Context Management Use Cases
- Coding: Context editing clears old file reads and test results while memory preserves debugging insights and architectural decisions, enabling agents to work on large codebases without losing progress
- Research: Memory stores key findings while context editing removes old search results, building knowledge bases that improve agent performance over time
- Data Processing: Agents store intermediate results in memory while context editing clears raw data, handling workflows that would otherwise exceed token limits
- Customer Support: Memory maintains customer history and preferences across sessions while context editing removes resolved ticket details, enabling coherent long-term relationships
Getting Started: Your Path to AI-Assisted Development
Getting Started with Claude Sonnet 4.5
Upgrading to Sonnet 4.5 is straightforward for existing Claude users:
claude-sonnet-4-5-20250929
anthropic.claude-sonnet-4-5-20250929-v1:0
claude-sonnet-4-5@20250929
Existing API calls will work with one exception: Sonnet 4.5 does not allow both temperature
and top_p
parameters simultaneously. Use only one.
For complex coding tasks, enable extended thinking for optimal performance (though be aware this impacts prompt caching efficiency)
- • Add the memory tool for long-running agents (requires beta header:
context-management-2025-06-27
) - • Configure context editing for better context management
Getting Started with Claude Code 2.0
For Individual Developers
- VS Code Extension: Install from the VS Code Extension Marketplace (search for "Claude Code" or visit the marketplace page)
- Terminal Installation: Update your local Claude Code installation to get Terminal v2 improvements and checkpoints
- Model Selection: Use
/model
command to switch between Sonnet 4.5 (default), Opus 4.1, and Haiku 3.5 - Checkpoint Usage: Double-tap
ESC
or use/rewind
to roll back changes
Pricing Tiers
Included with your Pro plan, perfect for short coding sprints
Great value for everyday use with access to both Sonnet 4.5 and Opus 4.1
For power users with the most access to Opus 4.1
Includes self-serve seat management and additional usage at standard API rates
Everything in Team plus advanced security, data, and user management
Standard pricing, deploy to unlimited developers with no per-seat fee
Getting Started with the Claude Agent SDK
Installation
Choose your language and install:
# TypeScript/JavaScript
npm install @anthropic-ai/claude-agent-sdk
# Python
pip install anthropic-agent-sdk
Authentication
Set up authentication via:
Retrieve API key from Claude Console and set environment variable:
export ANTHROPIC_API_KEY=your_api_key
export CLAUDE_CODE_USE_BEDROCK=1
Then configure AWS credentials
export CLAUDE_CODE_USE_VERTEX=1
Then configure Google Cloud credentials
First Steps
- Review the Agent SDK overview documentation
- Explore examples in the TypeScript GitHub repo or Python GitHub repo
- Start with a simple agent following the gather context → take action → verify work loop
- Add tools incrementally as you identify needs
- Implement permissions, logging, and error handling for production use
Best Practices for Success
Maximizing Sonnet 4.5 Performance
- Enable extended thinking for complex coding tasks requiring deep reasoning
- Use parallel tool calls to speed up information gathering and context building
- Leverage context awareness by letting the model manage its own context intelligently
- Understand the new communication style (concise and direct) and adjust prompts if you need more explanation
Claude Code Workflow Tips
- Use checkpoints before risky or experimental changes—they're your safety net
- Combine checkpoints with version control for comprehensive history management
- Design subagents with narrow, well-defined mandates rather than broad capabilities
- Set up hooks for automated testing, linting, and quality checks
- Utilize background tasks for dev servers and long-running processes
Agent Development Principles
- Start with agentic search over file systems; add semantic search only if you need speed improvements
- Define clear, narrow tool responsibilities to maximize context efficiency
- Use rules-based feedback (like linting) when possible—it's more reliable than LLM-as-judge
- Build representative test sets based on actual usage patterns
- Log permissions and actions for auditability and debugging
Real-World Use Cases: How Companies Are Using These Tools
Enterprise Development Tools
Leading development platforms have integrated Sonnet 4.5 with impressive results:
- Cursor reports state-of-the-art coding performance with significant improvements on longer horizon tasks, reinforcing why many developers choose Claude for their most complex problems
- GitHub Copilot uses Sonnet 4.5 for multi-step reasoning and code comprehension, enabling agentic experiences that handle complex, codebase-spanning tasks better
- Replit achieved a remarkable improvement: from 9% error rate on Sonnet 4 to 0% on their internal code editing benchmark with Sonnet 4.5
- Devin saw an 18% increase in planning performance and 12% improvement in end-to-end eval scores—their biggest jump since Sonnet 3.6
Production Applications
- Canva (240M+ users): Uses Sonnet 4.5 for complex, long-context tasks from engineering in their codebase to in-product features, calling it "noticeably more intelligent and a big leap forward"
- Figma: Claude Sonnet 4.5 has noticeably improved Figma Make in early testing, making it easier to prompt and iterate with more functional prototypes
- Harvey (legal AI): Uses Sonnet 4.5 for the most complex litigation tasks, like analyzing full briefing cycles and conducting research to synthesize excellent first drafts
- Hai (security): Reduced average vulnerability intake time by 44% while improving accuracy by 25% with their security agents
Agent SDK Applications
- Financial Compliance Agents: Companies like Ramp build agents that analyze portfolios, evaluate investments, and perform complex calculations
- Cybersecurity Agents: CrowdStrike uses Claude for red teaming, generating creative attack scenarios to strengthen defenses
- Code Modernization: Enterprises use the Agent SDK to navigate legacy codebases, understand dependencies, and propose incremental refactors
- Research Automation: Teams build deep research agents that analyze hundreds of documents, cross-reference information, and generate comprehensive reports
Conclusion: The Future of AI-Assisted Development
The September 29, 2025 launch of Claude Sonnet 4.5, Claude Code 2.0, and the Claude Agent SDK represents more than a product update—it's a fundamental shift in how AI participates in software development and knowledge work more broadly.
With Sonnet 4.5 achieving state-of-the-art coding performance (77.2% SWE-bench), maintaining focus for 30+ hours on complex tasks, and delivering these capabilities at the same $3/$15 pricing as Sonnet 4, the barrier to accessing frontier AI for development has never been lower. Claude Code 2.0's checkpoints, subagents, hooks, and background tasks transform experimental features into production-ready workflows. And the Claude Agent SDK democratizes the infrastructure powering these capabilities, enabling developers to build custom agents for any domain.
The synergy between these three launches is what makes them truly powerful: Sonnet 4.5 provides the intelligence, Code 2.0 delivers the developer experience, and the Agent SDK offers the extensibility. Together, they enable development workflows that were simply not possible before—from safe, ambitious refactors to autonomous multi-day coding sessions to specialized agents that understand your specific business domain.
Perhaps most exciting is what these tools suggest about the future. As context management improves, as models become more aligned and capable, and as the infrastructure for building agents matures, we're moving toward a world where AI truly collaborates with developers rather than just assisting them. The checkpoints system acknowledges that exploration and experimentation are core to development. The subagents architecture recognizes that complex problems require parallel, specialized approaches. The memory tool and context editing understand that valuable work extends beyond any single session.
Whether you're a solo developer looking to accelerate your learning, an engineering team seeking to modernize legacy systems, or an enterprise building AI-powered workflows, these tools offer a practical path forward. The question is no longer whether AI can assist development—it's how quickly you can integrate these capabilities into your workflow to stay competitive.
Ready to Transform Your Development Workflow?
Our AI & Digital Transformation experts can help you integrate Claude Sonnet 4.5, Claude Code 2.0, and the Agent SDK into your development process to accelerate productivity and innovation.