Browser-automation agents bifurcated in 2025-2026 between DOM-driven approaches (Playwright + Claude, Stagehand, Browserbase) and vision-driven approaches (Anthropic Computer Use, OpenAI CUA). The DOM-driven stacks are 12-17 percentage points more reliable on common tasks; the vision-driven stacks unlock workloads the DOM-driven stacks can't reach (canvas-only apps, image-driven UIs, anti-bot screens).
We compare five stacks across reliability, runtime locality, cost, DX, and best-fit workload. Most teams default to a DOM-driven stack (Playwright + Claude or Stagehand) for the 80% of workloads it covers, and reach for a vision-driven stack only when DOM access fails.
This post covers the 7-axis matrix, deep dives on each stack, and four reference workloads we run for clients today — data extraction, form-filling automation, QA testing, and competitive intelligence.
- 01DOM-driven stacks beat vision-driven stacks on common-task reliability by 12-17 points.Playwright + Claude scores 92% reliability on common browser-automation tasks; Stagehand 89%; Browserbase 90%. Anthropic Computer Use scores 78%; OpenAI CUA scores 75%. The gap is real and persists across task types — DOM access is more reliable than vision-driven inference for the 80% of tasks where DOM is available. Use vision-driven stacks only for workloads where DOM access fails.
- 02Playwright + Claude is the DX leader — deterministic + agentic + cheapest at scale.Playwright is the deterministic web-automation gold standard; pairing it with Claude (or another LLM) for natural-language task definition and DOM reasoning produces the cleanest developer experience in the field. Self-hosted Playwright + LLM API costs $0.02-0.10/task (cheapest at scale). Right primary for engineering teams that own their automation infrastructure.
- 03Stagehand is the cleanest dev abstraction — Playwright underneath, agent ergonomics on top.Stagehand by Browserbase wraps Playwright with agent-friendly methods (act, observe, extract). The abstraction reduces boilerplate by 60-70% vs raw Playwright + LLM glue. Pairs naturally with Browserbase managed runtime. Right pick for teams that want agent ergonomics without hand-rolling the LLM-to-Playwright glue.
- 04Browserbase is the managed-runtime leader — pay-per-minute browser-as-a-service.Browserbase offers managed Chromium with CDP access at $0.10-0.40/browser-minute. Right pick when self-hosting Playwright at scale becomes operational toil — managed CAPTCHA solving, anti-bot evasion, residential proxy support, session recording. Pairs natively with Stagehand. Cost ramps with scale; under 100 browser-hours/month, self-hosted Playwright wins on cost.
- 05Computer Use + CUA are vision-driven — for workloads where DOM fails.Anthropic Computer Use and OpenAI Computer-Using-Agent operate on screen pixels rather than DOM. Reach workloads that DOM-driven stacks can't (canvas-heavy apps, image-driven UIs, anti-bot screens that obscure DOM). Trade-offs: 12-17 point reliability gap to DOM-driven on common tasks; 4-8x cost; cloud-only runtime for CUA. Use as fallback, not primary.
01 — The FieldThe 2026 browser-agent field.
Browser-automation agents are at the intersection of three ecosystems: deterministic web-automation (Playwright, Puppeteer, Selenium), LLM-driven reasoning (Claude, GPT-5.5, Gemini 3), and cloud-runtime infrastructure (Browserbase, Apify, ScrapingBee). By 2026, the production-grade stacks combine pieces from each — the five we compare here represent the dominant combinations.
The five stacks split on two primary axes: DOM-driven vs vision-driven (which surface the agent operates on), and self-hosted vs managed runtime (where the browser actually runs). DOM-driven self-hosted (Playwright + Claude) is the cheapest and highest-reliability default; managed runtimes pay back when scale becomes operational toil; vision-driven stacks unlock workloads the DOM-driven stacks can't reach.
Playwright + Claude — DX leader
Self-hosted · DOM-driven · LLM API onlyDeterministic Playwright + Claude (or any frontier LLM) for natural-language task definition + DOM reasoning. Self-hosted runtime; pay only for LLM API calls. Cheapest at scale; cleanest DX for engineering-owned infrastructure.
Engineering teamsStagehand — agent abstraction
act/observe/extract API · Browserbase or self-hostedStagehand wraps Playwright with agent-friendly methods. Reduces boilerplate by 60-70% vs raw Playwright + LLM glue. Pairs naturally with Browserbase. Right pick for teams that want agent ergonomics without DIY plumbing.
DX-first abstractionBrowserbase — managed runtime
Cloud Chromium · CDP-as-a-service · $0.10-0.40/minPay-per-minute managed Chromium. CAPTCHA solving, anti-bot evasion, residential proxies, session recording. Right pick when self-hosting at scale becomes operational toil.
Managed runtimeAnthropic Computer Use
Vision-driven · Claude · screen controlOperates on screen pixels instead of DOM. Reach workloads DOM-driven stacks can't (canvas apps, image UIs, anti-bot screens). 78% reliability on common tasks; runs against any browser the agent can see.
Vision-drivenOpenAI Computer-Using-Agent
Cloud-only · OpenAI · vision-drivenOpenAI's vision-driven counterpart. Cloud-only runtime; OpenAI-locked. 75% reliability on common tasks. Right pick when OpenAI lock-in is acceptable and the workload is vision-driven.
OpenAI vision02 — MatrixFeature matrix, five stacks.
The matrix below covers the seven capabilities that drive 2026 browser-agent decisions: reliability on common tasks, runtime locality, DOM vs vision surface, cost per task, DX (developer experience), provider posture, and best-fit workload.
Reliability on common tasks
Playwright + Claude wins (92%). Browserbase 90%, Stagehand 89%, Computer Use 78%, CUA 75%. The DOM-driven stacks lead by 12-17 percentage points on common tasks. Use vision-driven stacks only when DOM access fails.
Playwright + ClaudeRuntime locality (self-hosted vs managed)
Playwright + Claude: self-hosted only. Stagehand: self-hosted or Browserbase managed. Browserbase: managed only. Computer Use: any browser the agent can see (most flexible). CUA: cloud-only OpenAI runtime.
Computer Use most flexibleDOM vs vision surface
DOM-driven (more reliable for 80% of tasks): Playwright + Claude, Stagehand, Browserbase. Vision-driven (reaches workloads DOM can't): Anthropic Computer Use, OpenAI CUA. Pick DOM-driven first; fall back to vision when DOM fails.
DOM-driven for 80%Cost per task
Playwright + Claude $0.02-0.10/task (cheapest at scale). Stagehand $0.05-0.15. Browserbase $0.10-0.40 (browser-minute pricing). Computer Use $0.20-0.40 (vision tokens). CUA $0.20-0.50 (vision + OpenAI premium).
Playwright + ClaudeDeveloper experience (DX)
Stagehand wins on agent abstractions (act/observe/extract reduce boilerplate 60-70%). Playwright + Claude wins on flexibility. Browserbase + Stagehand together produce the cleanest managed-runtime DX. Computer Use + CUA have minimal DX surface — they're vision-only.
Stagehand (abstraction) · Playwright (flexibility)Provider posture (lock-in)
Playwright + Claude: provider-flexible (Claude swappable for any LLM). Stagehand: provider-flexible. Browserbase: tied to Browserbase managed runtime. Computer Use: Anthropic-only model. CUA: OpenAI-only model + cloud.
Playwright (most flexible)Best-fit workload
Playwright + Claude: engineering-team workflows, scale-cost-sensitive. Stagehand: any team that wants agent abstractions. Browserbase: managed runtime needs (anti-bot, CAPTCHA). Computer Use: canvas/image-driven UIs. CUA: vision tasks where OpenAI lock-in is acceptable.
Match workload03 — Playwright + ClaudePlaywright + Claude — the DX leader.
Playwright is the deterministic web-automation gold standard. Pairing it with Claude (or another frontier LLM) for natural-language task definition and DOM reasoning produces the cleanest developer experience for engineering-owned automation. Self-hosted runtime; pay only for LLM API calls. The combination is the cheapest scale-out path and remains the highest-reliability default for DOM-accessible workloads.
Highest common-task reliability
Playwright's deterministic browser control combined with Claude's DOM reasoning hits 92% reliability on common automation tasks (data extraction, form filling, navigation). The gap to vision-driven stacks (78-80%) is real and persists across task families.
Reliability leaderCheapest at scale
Self-hosted Playwright + Claude API calls land at $0.02-0.10 per task. At 1000+ tasks/month, the cost gap to managed runtimes (Browserbase $0.10-0.40/task, CUA $0.20-0.50/task) compounds. Right default for high-volume automation.
Cost leaderOperational ownership at scale
Self-hosted Playwright requires browser-fleet management at scale: CAPTCHA handling, anti-bot evasion, residential proxy rotation, session recording. Pays back on cost but adds operational toil. Above 100 browser-hours/month, evaluate managed runtimes.
Self-hosted toil"Playwright + Claude wins on reliability and cost. Stagehand wins on developer experience. Browserbase wins on operational simplicity. Pick by which axis hurts most."— Internal browser-agent retro, March 2026
04 — StagehandStagehand — the agent abstraction leader.
Stagehand by Browserbase wraps Playwright with agent-friendly methods (act, observe, extract). The abstraction reduces boilerplate by 60-70% vs raw Playwright + LLM glue. Pairs naturally with Browserbase managed runtime but works with self-hosted Playwright too. Right pick for teams that want agent ergonomics without hand-rolling the LLM-to-Playwright integration.
Boilerplate reduction
Stagehand's act/observe/extract methods reduce boilerplate by 60-70% vs raw Playwright + LLM glue. The abstraction is well-designed — high enough to remove plumbing, low enough that the underlying Playwright is still accessible when needed.
DX abstractionNear-Playwright reliability
89% common-task reliability — only 3 points below raw Playwright + Claude (92%). The abstraction overhead is small. Pairs naturally with Browserbase for managed runtime; works with self-hosted Playwright too.
Reliable abstractionYounger ecosystem
Stagehand is younger than raw Playwright. Community size is smaller; debugging resources are thinner; edge cases occasionally surface where the abstraction layer adds friction. The trade-off is minor for most workloads but real for the truly unusual.
Younger ecosystem05 — BrowserbaseBrowserbase — the managed runtime leader.
Browserbase offers managed Chromium with CDP access as a service. Pay-per-minute pricing ($0.10-0.40/browser-minute) covers managed CAPTCHA solving, anti-bot evasion, residential proxy support, and session recording. Right pick when self-hosting Playwright at scale becomes operational toil — typically above 100 browser-hours/month.
Managed CAPTCHA + anti-bot evasion
Browserbase handles the operational layer that breaks self-hosted Playwright at scale: CAPTCHA solving via vendor partnerships, anti-bot evasion patterns, fingerprint randomization, residential proxies. Pays back at any scale where these become recurring failures.
Operational simplicityPay-per-minute economics
$0.10-0.40 per browser-minute (varies by features). At 100+ browser-hours/month the cost is meaningful but pays back on operational simplicity. Below ~100 hours/month, self-hosted Playwright wins on cost. Crossover point depends on the team's ops capacity.
Scale-cost tradeCost ramps with scale
At 1000+ browser-hours/month, managed runtime costs compound vs self-hosted Playwright + Claude. The crossover point depends on how much ops time the team has — if engineering capacity is constrained, Browserbase pays back longer.
Scale-dependent06 — Computer Use + CUAVision-driven — Computer Use and CUA.
Anthropic Computer Use and OpenAI Computer-Using-Agent operate on screen pixels rather than DOM. They reach workloads that DOM-driven stacks can't (canvas-heavy apps, image-driven UIs, anti-bot screens that obscure DOM). The trade-offs are real: 12-17 point reliability gap to DOM-driven on common tasks, 4-8x cost, and cloud-only runtime for CUA. Use as fallback, not primary.
Vision-driven · Claude · 78% reliability
Operates on screen pixels with Claude reasoning. Runs against any browser the agent can see — local Chrome, headless, remote VM. 78% reliability on common tasks. Pairs naturally with Claude Code or Anthropic API as the agent surface. Right when DOM fails or the workload is canvas-heavy.
Vision · flexible runtimeCloud-only · OpenAI · 75% reliability
OpenAI's vision-driven counterpart. Cloud-only runtime — agent runs in OpenAI's managed VMs. OpenAI-locked. 75% reliability on common tasks. Right when OpenAI lock-in is acceptable and the team values managed-runtime simplicity over flexibility.
OpenAI-native vision07 — Reference WorkloadsFour reference workloads.
Below are the four browser-automation workloads we deploy most often for client engagements, with the stack recommendation that consistently wins on each. The mapping isn't absolute, but each pairing is the path of least friction.
Data extraction (structured scraping)
Pull structured data from a list of URLs. DOM-driven; tasks are short and high-volume. Cost matters most. Playwright + Claude is the default; self-hosted runtime; pay only for LLM API calls. Add Browserbase if anti-bot evasion becomes a bottleneck.
Playwright + ClaudeForm-filling automation (cross-app workflows)
Fill complex forms, navigate multi-step workflows, handle error states. DOM-driven; medium duration; agent ergonomics matter. Stagehand (with Browserbase or self-hosted Playwright) wins on developer experience and reliability for these workloads.
Stagehand + BrowserbaseQA testing (visual + functional)
Run end-to-end tests against a web app. DOM-driven for functional tests; vision-driven for visual regression. Playwright + Claude for functional flows; layer Computer Use for visual regression where DOM diffing isn't enough.
Playwright + Claude (+ Computer Use)Competitive intelligence (anti-bot heavy sites)
Pull data from sites with active anti-bot defenses (price-aggregator, booking, ticketing). DOM access is intermittently blocked; managed runtime helps. Browserbase + Stagehand handles most cases; fall back to Computer Use when DOM is fully obscured.
Browserbase + Stagehand (+ Computer Use)08 — ConclusionPick by workload + ops capacity, not novelty.
There is no single best browser-agent stack. There are right defaults per workload and ops capacity.
By April 2026 the browser-automation field has consolidated to five production-grade stacks: Playwright + Claude, Stagehand, Browserbase, Anthropic Computer Use, and OpenAI CUA. Each occupies a different spot on the trade-off surface, and each wins on its home territory. There is no "best" stack in the abstract; there is the right default for the workload and the team's ops capacity.
The pattern that scales: pick the DOM-driven stack first (Playwright + Claude or Stagehand) for the 80% of workloads it covers. Add Browserbase as managed runtime when self-hosting becomes operational toil (typically above 100 browser-hours/month). Reach for vision-driven stacks (Computer Use or CUA) only when DOM access fails — canvas apps, image UIs, anti-bot screens that obscure DOM.
The right move for most engineering teams: standardize on Playwright + Claude as the primary; add Browserbase when scale demands it; layer Computer Use as the vision-driven fallback. The three-stack pattern covers ~95% of browser-automation workloads with disciplined cost economics and a single primary mental model.