Anthropic Computer Use API: Desktop Automation Guide

What Is Computer Use API?

Computer Use represents a paradigm shift in AI capabilities. Rather than building specialized tools for individual tasks, Anthropic is teaching Claude general computer skills—enabling it to use the same interfaces, applications, and workflows that humans use every day.

Released in public beta on October 22, 2024, Computer Use makes Claude Sonnet 3.5 the first frontier AI model to offer autonomous desktop control. Companies like Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company are already exploring applications that require dozens or even hundreds of steps to complete.

What Makes Computer Use Different?

Traditional AI tools require custom integrations for each application. Computer Use eliminates this bottleneck by teaching Claude to interact with any software interface—web browsers, desktop applications, command-line tools—just like a human user would.

This means Claude can automate complex workflows across multiple applications without needing API access or custom integrations for each tool.

Key Capabilities

Visual Understanding: Analyze screenshots to understand UI elements, content, and context
Precise Mouse Control: Move cursor and click with pixel-perfect accuracy
Keyboard Input: Type text, use keyboard shortcuts, and navigate interfaces
Multi-Step Workflows: Chain actions together to complete complex tasks
Error Recovery: Adapt to unexpected UI changes and error conditions

Current API Version

As of January 2025, Computer Use requires the API header anthropic-beta: computer-use-2025-01-24 with the claude-sonnet-4-5 model. The API is actively evolving with regular updates to improve accuracy and reliability.

Integration Opportunity: Need help implementing AI automation workflows? Digital Applied's AI & Digital Transformation services can help you leverage Computer Use API for your business processes.

How Computer Use Works

Computer Use operates through a continuous feedback loop where Claude analyzes the current screen state, decides on actions, and observes the results—similar to how a human user interacts with a computer.

The Execution Cycle

Step 1: Screenshot Analysis

Claude captures and analyzes a screenshot of the current desktop state. Using its vision capabilities, it identifies UI elements, reads text, recognizes buttons, and understands the application context.

This visual understanding enables Claude to work with any application, even those without accessibility features or APIs.

Step 2: Action Planning

Based on the screenshot and task objective, Claude determines the next action. This might be moving the mouse to specific coordinates, clicking a button, typing text, or executing a keyboard shortcut.

The planning process considers UI patterns, common workflows, and task requirements to select optimal actions.

Step 3: Pixel Counting

Here's where Anthropic's innovation shines: Claude counts pixels from the screen edges to calculate exact cursor positions. This pixel-perfect accuracy works across any screen resolution and application layout.

Training Claude to count pixels accurately was critical to Computer Use's reliability. Without this skill, the model struggles to give precise mouse commands.

Step 4: Action Execution

Claude executes the planned action using the Computer tool's mouse and keyboard functions. The action modifies the desktop state—opening applications, filling forms, navigating menus, etc.

After execution, Claude captures a new screenshot and evaluates whether the action succeeded or requires adjustment.

Step 5: Goal Evaluation

Claude compares the new screen state against the task objective. If the goal is achieved, the workflow completes. If not, Claude plans the next action and continues the cycle.

This iterative approach enables Claude to handle unexpected UI changes, error dialogs, and multi-step workflows dynamically.

Why Pixel Counting Matters

When you ask Claude to click a button, it needs to translate "the blue submit button in the lower right" into exact pixel coordinates like (1245, 867). Traditional computer vision approaches struggle with this translation across different screen sizes and layouts.

Anthropic's solution was to train Claude to count pixels from reference points (screen edges, known UI elements) to target locations. This skill enables reliable cursor positioning regardless of screen resolution, DPI scaling, or application layout.

API Setup & Configuration

Setting up Computer Use requires the Anthropic SDK and proper configuration for desktop automation. The fastest way to get started is using Anthropic's official Docker container with a preconfigured environment.

Quick Start with Docker

The official anthropic-quickstarts/computer-use-demo repository provides a one-liner setup that spins up an Ubuntu 22.04 container with VNC server, desktop environment, and necessary tools:

docker run -d \
  --name claude-computer-use \
  -p 5900:5900 \
  -e ANTHROPIC_API_KEY=your_api_key_here \
  anthropic/computer-use-demo

Connect to the VNC server on localhost:5900 to view Claude's actions in real-time. This container includes:

Ubuntu 22.04 with XFCE desktop environment
Firefox browser for web automation
PyAutoGUI for mouse/keyboard control
Python environment with Anthropic SDK

Python SDK Installation

For custom implementations, install the required libraries:

pip install anthropic pyautogui pillow

# For screenshot capture
pip install mss

# For image processing
pip install opencv-python numpy

Basic API Configuration

import anthropic
import pyautogui
from PIL import Image
import io
import base64

# Initialize Anthropic client
client = anthropic.Anthropic(
    api_key="your_api_key_here"
)

# Configure Computer Use beta header
COMPUTER_USE_BETA = "computer-use-2025-01-24"

# Define available tools
tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 1
    },
    {
        "type": "text_editor_20250124",
        "name": "str_replace_editor"
    },
    {
        "type": "bash_20250124",
        "name": "bash"
    }
]

API Pricing & Limits

Computer Use runs on Claude Sonnet 4.5 with standard API pricing of $3 per million input tokens and $15 per million output tokens. Additional considerations:

System Prompt: 466 tokens for automated tool selection
Tool Definitions: 499 tokens per tool
Screenshot Images: Typically 100-200 tokens per screenshot depending on resolution

Development Services: Our Web Development team can help integrate Computer Use API into your existing applications and workflows.

Screenshot Analysis

Screenshot analysis is the foundation of Computer Use. Claude needs to understand what's currently visible on screen to decide which actions to take. Let's explore how to capture, encode, and send screenshots to the API.

Capturing Screenshots

import mss
import base64
from PIL import Image
from io import BytesIO

def capture_screenshot():
    """Capture screenshot and encode as base64"""
    with mss.mss() as sct:
        # Capture primary monitor
        monitor = sct.monitors[1]
        screenshot = sct.grab(monitor)

        # Convert to PIL Image
        img = Image.frombytes(
            'RGB',
            screenshot.size,
            screenshot.rgb
        )

        # Optimize size (reduce resolution if needed)
        # Computer Use works best with 1920x1080 or smaller
        max_width = 1920
        if img.width > max_width:
            ratio = max_width / img.width
            new_size = (max_width, int(img.height * ratio))
            img = img.resize(new_size, Image.Resampling.LANCZOS)

        # Convert to base64
        buffer = BytesIO()
        img.save(buffer, format='PNG', optimize=True)
        img_str = base64.b64encode(buffer.getvalue()).decode()

        return {
            'type': 'image',
            'source': {
                'type': 'base64',
                'media_type': 'image/png',
                'data': img_str
            }
        }

# Example usage
screenshot = capture_screenshot()

Sending Screenshots to Claude

def analyze_screen(task_description: str):
    """Send screenshot to Claude for analysis"""
    screenshot = capture_screenshot()

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        tools=tools,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"Task: {task_description}\n\nAnalyze this screenshot and determine the next action."
                    },
                    screenshot
                ]
            }
        ],
        betas=[COMPUTER_USE_BETA]
    )

    return response

# Example: Analyze a login screen
result = analyze_screen("Fill out the login form with username 'demo' and click submit")

Understanding Claude's Analysis

Claude's vision model analyzes screenshots to identify:

UI Elements: Buttons, text fields, dropdowns, menus, checkboxes
Text Content: Labels, instructions, error messages, form fields
Visual Context: Application state, active windows, loaded pages
Spatial Layout: Element positions, sizes, relationships

Performance Optimization

Screenshots consume significant tokens. Optimize by:

Resizing to 1920x1080 or smaller before encoding
Using PNG compression with optimize=True
Capturing only relevant screen regions when possible
Reducing screenshot frequency in repeated workflows

Mouse & Keyboard Control

The Computer tool provides mouse and keyboard functions that Claude invokes based on screenshot analysis. Let's explore how to implement and handle these control mechanisms.

Mouse Operations

Claude uses tool calls to control the mouse. Here's how to implement the handlers:

import pyautogui
import time

def handle_mouse_move(x: int, y: int):
    """Move cursor to specific coordinates"""
    pyautogui.moveTo(x, y, duration=0.2)
    time.sleep(0.1)
    return {"success": True, "action": f"moved to ({x}, {y})"}

def handle_left_click():
    """Perform left mouse click"""
    pyautogui.click()
    time.sleep(0.2)
    return {"success": True, "action": "left click"}

def handle_right_click():
    """Perform right mouse click"""
    pyautogui.rightClick()
    time.sleep(0.2)
    return {"success": True, "action": "right click"}

def handle_double_click():
    """Perform double click"""
    pyautogui.doubleClick()
    time.sleep(0.2)
    return {"success": True, "action": "double click"}

def handle_mouse_drag(start_x: int, start_y: int, end_x: int, end_y: int):
    """Drag from start to end coordinates"""
    pyautogui.moveTo(start_x, start_y)
    time.sleep(0.1)
    pyautogui.dragTo(end_x, end_y, duration=0.5)
    time.sleep(0.2)
    return {"success": True, "action": f"dragged from ({start_x}, {start_y}) to ({end_x}, {end_y})"}

Keyboard Operations

def handle_type_text(text: str):
    """Type text with natural typing speed"""
    pyautogui.write(text, interval=0.05)
    time.sleep(0.2)
    return {"success": True, "action": f"typed text: {text[:50]}..."}

def handle_key_press(key: str):
    """Press a single key or key combination"""
    pyautogui.press(key)
    time.sleep(0.1)
    return {"success": True, "action": f"pressed key: {key}"}

def handle_hotkey(*keys):
    """Press key combination (e.g., Ctrl+C)"""
    pyautogui.hotkey(*keys)
    time.sleep(0.2)
    return {"success": True, "action": f"hotkey: {'+'.join(keys)}"}

# Example keyboard shortcuts
def common_shortcuts():
    return {
        "copy": lambda: handle_hotkey('ctrl', 'c'),
        "paste": lambda: handle_hotkey('ctrl', 'v'),
        "save": lambda: handle_hotkey('ctrl', 's'),
        "undo": lambda: handle_hotkey('ctrl', 'z'),
        "select_all": lambda: handle_hotkey('ctrl', 'a'),
        "tab": lambda: handle_key_press('tab'),
        "enter": lambda: handle_key_press('enter'),
        "escape": lambda: handle_key_press('escape')
    }

Tool Call Handler

Process Claude's tool calls to execute the requested actions:

def process_tool_call(tool_use):
    """Execute tool calls from Claude's response"""
    tool_name = tool_use.name
    tool_input = tool_use.input

    # Computer tool actions
    if tool_name == "computer":
        action = tool_input.get("action")

        if action == "mouse_move":
            return handle_mouse_move(
                tool_input["coordinate"][0],
                tool_input["coordinate"][1]
            )
        elif action == "left_click":
            return handle_left_click()
        elif action == "right_click":
            return handle_right_click()
        elif action == "double_click":
            return handle_double_click()
        elif action == "type":
            return handle_type_text(tool_input["text"])
        elif action == "key":
            return handle_key_press(tool_input["text"])
        elif action == "screenshot":
            return capture_screenshot()

    # Text editor tool
    elif tool_name == "str_replace_editor":
        command = tool_input.get("command")
        if command == "view":
            with open(tool_input["path"], 'r') as f:
                return {"content": f.read()}
        elif command == "str_replace":
            # Implement file editing logic
            pass

    # Bash tool
    elif tool_name == "bash":
        import subprocess
        result = subprocess.run(
            tool_input["command"],
            shell=True,
            capture_output=True,
            text=True
        )
        return {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "exit_code": result.returncode
        }

    return {"error": "Unknown tool or action"}

Challenges with UI Elements

Some UI elements are trickier for Claude to manipulate using mouse movements:

Dropdowns: May require multiple clicks or hovering
Scrollbars: Dragging can be imprecise
Sliders: Fine-tuning values is difficult

Solution: Prompt Claude to use keyboard shortcuts when available (Tab, Arrow keys, Enter) for more reliable interactions.

Workflow Automation Examples

Let's explore real-world automation workflows that demonstrate Computer Use capabilities and best practices.

Example 1: Form Automation

Automatically fill out web forms with data from a structured source:

def automate_form_filling(form_data: dict):
    """Fill web form with provided data"""

    # Initial prompt with form data
    task = f"""
    Fill out the registration form with this data:
    - First Name: {form_data['first_name']}
    - Last Name: {form_data['last_name']}
    - Email: {form_data['email']}
    - Phone: {form_data['phone']}

    Then click the Submit button.
    """

    messages = [{
        "role": "user",
        "content": [
            {"type": "text", "text": task},
            capture_screenshot()
        ]
    }]

    # Automation loop
    max_iterations = 20
    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            tools=tools,
            messages=messages,
            betas=[COMPUTER_USE_BETA]
        )

        # Process tool calls
        if response.stop_reason == "tool_use":
            tool_results = []
            for content in response.content:
                if content.type == "tool_use":
                    result = process_tool_call(content)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": content.id,
                        "content": str(result)
                    })

            # Add assistant response and tool results
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

        # Check completion
        elif response.stop_reason == "end_turn":
            print("Form submission complete")
            break

    return True

Example 2: Data Entry from Spreadsheets

Transfer data from Excel/CSV into web applications:

import pandas as pd

def bulk_data_entry(csv_path: str, app_url: str):
    """Enter data from CSV into web application"""

    # Load data
    df = pd.read_csv(csv_path)

    # Open application
    task = f"Open web browser and navigate to {app_url}"
    automate_task(task)

    # Process each row
    for index, row in df.iterrows():
        print(f"Processing row {index + 1}/{len(df)}")

        task = f"""
        Enter the following data into the form:
        - Product: {row['product_name']}
        - Quantity: {row['quantity']}
        - Price: {row['price']}
        - Category: {row['category']}

        Click Save and wait for confirmation.
        Then click 'Add Another' to continue.
        """

        automate_task(task)

        # Brief pause between entries
        time.sleep(2)

    print("Data entry complete")

Example 3: Automated Testing

Test user interfaces by simulating user interactions:

def test_checkout_flow():
    """Test e-commerce checkout process"""

    test_steps = [
        "Navigate to the product page",
        "Click 'Add to Cart' button",
        "Click the shopping cart icon",
        "Verify product appears in cart",
        "Click 'Proceed to Checkout'",
        "Fill shipping address with test data",
        "Select shipping method: Standard",
        "Enter payment info (test card)",
        "Click 'Place Order'",
        "Verify order confirmation appears"
    ]

    results = []

    for step in test_steps:
        print(f"Testing: {step}")

        task = f"""
        {step}

        After completing this action:
        1. Take a screenshot
        2. Verify the action succeeded
        3. Report any errors or unexpected behavior
        """

        result = automate_task(task)
        results.append({
            "step": step,
            "success": result.get("success", False),
            "notes": result.get("notes", "")
        })

        time.sleep(1)

    # Generate test report
    return generate_test_report(results)

Example 4: Document Processing

Extract information from documents and enter into systems:

def process_invoices(pdf_folder: str):
    """Extract data from invoices and enter into accounting system"""

    import os

    pdf_files = [f for f in os.listdir(pdf_folder) if f.endswith('.pdf')]

    for pdf_file in pdf_files:
        print(f"Processing {pdf_file}")

        # Open PDF and extract data
        task = f"""
        1. Open the PDF file at {os.path.join(pdf_folder, pdf_file)}
        2. Extract the following information:
           - Invoice number
           - Date
           - Vendor name
           - Total amount
           - Line items with descriptions and amounts
        3. Take note of all extracted data
        """

        extracted_data = automate_task(task)

        # Enter into accounting system
        entry_task = f"""
        1. Open the accounting software
        2. Click 'New Invoice Entry'
        3. Fill in the extracted data:
           {extracted_data}
        4. Attach the PDF file
        5. Click 'Save'
        6. Verify the entry was saved successfully
        """

        automate_task(entry_task)

        # Move processed file
        os.rename(
            os.path.join(pdf_folder, pdf_file),
            os.path.join(pdf_folder, 'processed', pdf_file)
        )

CRM Integration: Looking to automate CRM data entry? Explore our CRM & Automation services for custom workflow solutions.

Safety Guidelines & Best Practices

Computer Use introduces new security considerations. Following Anthropic's safety guidelines is essential for responsible deployment.

Critical Safety Requirements

Run in Isolated Environments: Always use virtual machines or containers, never on your main system
Minimal Privileges: Grant only necessary permissions and filesystem access
No Production Credentials: Never expose production API keys or passwords
Network Isolation: Restrict network access to only required services

Security Vulnerabilities

Anthropic acknowledges that Computer Use is susceptible to:

Jailbreaking: Attempts to bypass safety guidelines through adversarial prompts
Prompt Injection: Claude may follow commands found in on-screen content, potentially conflicting with user instructions
Unintended Actions: Model errors could trigger destructive operations

Development Best Practices

1. Start with Low-Risk Tasks

Begin exploration with non-critical workflows:

Data entry into test environments
Form filling with dummy data
UI testing without production access

2. Implement Human-in-the-Loop

Require confirmation for sensitive operations:

def require_confirmation(action: str) -> bool:
    """Request human confirmation for sensitive actions"""
    print(f"\nClaude wants to perform: {action}")
    response = input("Allow this action? (yes/no): ")
    return response.lower() == "yes"

# In tool handler
if action_is_sensitive(action):
    if not require_confirmation(action):
        return {"error": "Action denied by user"}

3. Monitor and Log All Actions

Maintain audit trail of all Computer Use actions:

import json
from datetime import datetime

def log_action(action_type: str, details: dict):
    """Log all Computer Use actions"""
    log_entry = {
        "timestamp": datetime.now().isoformat(),
        "action_type": action_type,
        "details": details
    }

    with open("computer_use_audit.jsonl", "a") as f:
        f.write(json.dumps(log_entry) + "\n")

# Log every tool call
log_action("mouse_click", {"x": 100, "y": 200})
log_action("type_text", {"text": "username"})

4. Set Timeouts and Iteration Limits

Prevent runaway automation loops:

def automate_with_limits(task: str, max_steps: int = 50, timeout_seconds: int = 300):
    """Run automation with safety limits"""
    start_time = time.time()

    for step in range(max_steps):
        # Check timeout
        if time.time() - start_time > timeout_seconds:
            raise TimeoutError("Automation exceeded time limit")

        # Execute step
        result = execute_step(task)

        if result.get("complete"):
            return result

    raise RuntimeError("Exceeded maximum step count")

Anthropic's Safety Classifiers

Anthropic has developed new classifiers that identify when Computer Use is being employed and whether harmful actions are occurring. These classifiers help detect:

Spam generation attempts
Misinformation creation
Fraud or malicious automation

Current Limitations

Computer Use is in public beta and has notable limitations. Understanding these constraints helps set realistic expectations and plan appropriate use cases.

Performance Challenges

Slow Execution: Significantly slower than human operation due to screenshot analysis and planning overhead
Action Errors: Mistakes are common, requiring error recovery and retries
UI Navigation Issues: Complex interfaces with many elements can confuse the model

Difficult Actions

Anthropic notes that some actions people perform effortlessly present challenges for Claude:

Scrolling: Both page scrolling and precise scrollbar manipulation
Dragging: Click-and-drag operations, especially over long distances
Zooming: Adjusting zoom levels or map navigation

Workaround: Use keyboard alternatives when available (Page Down, Arrow keys, keyboard shortcuts).

API and Model Constraints

Single Model: Only available with Claude Sonnet 4.5 (as of January 2025)
Beta Header Required: API changes may occur as the feature evolves
High Token Usage: Screenshots and tool definitions consume significant context

When NOT to Use Computer Use

Computer Use is not optimal for:

Tasks with available APIs (use API integration instead)
Real-time or time-sensitive operations
Production environments without supervision
Tasks requiring high precision or zero error tolerance
Systems with sensitive data or credentials

Future Improvements

Anthropic expects Computer Use capabilities to improve rapidly over time:

Better Accuracy: Reduced errors through improved training
Faster Execution: Optimized screenshot analysis and action planning
Advanced Actions: Better handling of complex UI interactions
Additional Models: Potential expansion to Opus and other model tiers

Your feedback during this beta period directly shapes these improvements.

Frequently Asked Questions

Which Claude models support Computer Use?

As of January 2025, only Claude Sonnet 4.5 supports Computer Use via the anthropic-beta: computer-use-2025-01-24 API header. Anthropic may expand support to other models like Opus 4 in the future based on beta feedback and development progress.

How much does Computer Use cost?

Computer Use runs on Claude Sonnet 4.5 standard API pricing:

$3 per million input tokens
$15 per million output tokens
Plus 466 tokens for system prompt and 499 tokens per tool definition

Screenshots typically consume 100-200 tokens each depending on resolution. A workflow with 20 screenshots might use 2,000-4,000 tokens for images alone.

Can I use Computer Use in production?

Computer Use is currently in public beta and is "at times cumbersome and error-prone" according to Anthropic. While technically usable in production, best practices recommend:

Starting with low-risk, non-critical workflows
Implementing human-in-the-loop for sensitive operations
Extensive testing in isolated environments first
Monitoring for errors and having fallback procedures

What's the difference between Computer Use and RPA tools?

Traditional RPA (Robotic Process Automation) tools like UiPath or Automation Anywhere require explicit programming of each action and are brittle when UIs change. Computer Use offers:

Natural Language Instructions: Describe what you want, not how to do it
Adaptability: Can handle UI changes and unexpected situations
Visual Understanding: Interprets interfaces like a human would

However, RPA tools may still be more reliable for well-defined, high-volume workflows where Computer Use's current error rates would be problematic.

How secure is Computer Use?

Computer Use introduces security risks including jailbreaking and prompt injection. Anthropic's recommendations:

Isolated Environments: Always use VMs or containers
Minimal Privileges: Restrict filesystem and network access
No Sensitive Data: Don't expose production credentials
Safety Classifiers: Anthropic detects harmful automation attempts

Never run Computer Use with access to production systems or sensitive information without proper isolation.

Can Computer Use work across multiple monitors?

The Computer tool configuration includes a display_number parameter to specify which monitor to control. However, current implementations typically work with single-monitor setups for simplicity.

For multi-monitor workflows, you would need to specify which display Claude should interact with and adjust coordinate calculations accordingly based on each monitor's resolution and position.

How do I debug Computer Use failures?

Common debugging strategies:

Review Screenshots: Save each screenshot Claude analyzes to see what it's seeing
Log All Actions: Record every tool call and result for post-analysis
Check Coordinates: Verify mouse positions match intended targets
Simplify Tasks: Break complex workflows into smaller, testable steps
Add Error Recovery: Implement retry logic and alternative approaches

Where can I find example implementations?

The official Anthropic Computer Use resources:

Quick Start Repository: github.com/anthropics/anthropic-quickstarts
Documentation: docs.claude.com - Computer Use Tool
Community Examples: Check the Anthropic Discord and GitHub discussions for shared implementations

Ready to Automate with Computer Use?

Anthropic's Computer Use API represents a breakthrough in desktop automation—enabling AI to interact with computers the way humans do. While still in beta with notable limitations, it opens unprecedented possibilities for workflow automation across any application interface.

Start with low-risk tasks in isolated environments, implement proper safety measures, and provide feedback to help shape this emerging technology. As Computer Use matures, it will transform how we automate complex, multi-application workflows.

Explore AI Automation Services

AI Development

Claude Agent Skills Framework: Build Specialized AI Agents

Master Claude Agent Skills: organized instructions, dynamic loading, domain-specific agents. Complete guide with code examples and production patterns.

11 min read

AI Development

Claude Sonnet 4.5 vs GPT-5 Pro: Complete 2025 Comparison

Compare Claude Sonnet 4.5 vs GPT-5 Pro: SWE-bench scores, pricing, coding performance, and use cases. Which AI model is best for your needs in 2025?

13 min read

AI Development

Zhipu AI GLM 4.6 vs Claude Sonnet 4.5: Open-Source Coding Model

GLM 4.6 challenges Claude Sonnet 4.5 with 200K context, 15% efficiency gains & MIT license. Complete comparison with benchmarks, pricing & deployment.

13 min read

Anthropic Computer Use API: Desktop Automation Guide

Key Takeaways

What Is Computer Use API?

Key Capabilities

Current API Version

How Computer Use Works

The Execution Cycle

Why Pixel Counting Matters

API Setup & Configuration

Quick Start with Docker

Python SDK Installation

Basic API Configuration

API Pricing & Limits

Screenshot Analysis

Capturing Screenshots

Sending Screenshots to Claude

Understanding Claude's Analysis

Mouse & Keyboard Control

Mouse Operations

Keyboard Operations

Tool Call Handler

Workflow Automation Examples

Safety Guidelines & Best Practices

Security Vulnerabilities

Development Best Practices

Anthropic's Safety Classifiers

Current Limitations

Frequently Asked Questions

Ready to Automate with Computer Use?

Related Articles