SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI Development9 min read

OpenAI Codex Desktop: Computer Use + 90+ App Plugins

OpenAI Codex Desktop v26.415 shipped April 16 with background computer use and 90+ plugins. Figma, Notion, GitHub. Benchmark head-to-head vs Claude Code.

Digital Applied Team
April 19, 2026
9 min read
Apr 16

Released

90+

Plugins

3M

Weekly Users

+70%

MoM Growth

Key Takeaways

Codex Desktop v26.415 Is the Big Jump: April 16 release adds background computer use, 90+ plugin marketplace, in-app browser (Atlas), gpt-image-1.5, memory preview, and thread automations.
Computer Use Is Not in EEA, UK, or Switzerland: Regional carveout at launch. Computer use is macOS-only for now — Windows and Linux timelines are undated.
Codex Is Cheaper, Claude Produces Better Code: Feb 2026 benchmarks: Claude uses 3–4x more tokens per task but wins 67% of blind code-quality comparisons. Codex leads Terminal-Bench (77.3% vs 65.4%).
Plugins Bundle Skills + Apps + MCP: Single installable packages combining prompt workflows, GUI integrations, and MCP servers. Install via /plugins. Self-publish coming.
Credential Exposure Is the Real Risk: Codex drives logged-in apps with full session privileges. Mitigate with a dedicated macOS user account isolated from the operator's primary work account.

On April 16, 2026, OpenAI shipped Codex Desktop v26.415 under the title "Codex for (almost) everything."The release brings background computer use, a 90+ plugin marketplace, an in-app browser built on OpenAI's Atlas technology, gpt-image-1.5 image generation, persistent memory, and thread automations that resume across days or weeks.

The framing matters. Codex Desktop is not the Codex CLI we benchmarked against OpenClaw and Hermes on April 18 — it is a GUI application with computer-use capability that drives Figma, Notion, browsers, and any other macOS app visually. This post covers what shipped, how it compares to Claude Code (which shipped similar capabilities on March 23), which tasks run better on which tool, and the setup and security posture agencies should know before deploying it.

What Shipped on April 16

The April 16 changelog lists nine substantial feature additions:

FeatureWhat it doesRegional or platform caveat
Background computer useAgents click and type across desktop apps in parallel, without hijacking the foreground sessionmacOS only; not available in EEA, UK, Switzerland
In-app browser (Atlas tech)Browse and comment on pages to instruct the agent; localhost preview supportAll desktop users
Image generationgpt-image-1.5 baked in for mockups, assets, concepts — no ChatGPT round-tripAll paying tiers
Memory (preview)Persists preferences, tech stacks, recurring workflows across threadsEnterprise / Edu / EU / UK rollout delayed
Thread automationsSchedule a thread to wake up later and resume work across days or weeksAll desktop users
90+ plugin marketplaceCurated packages that bundle skills, app integrations, MCP serversAll desktop users
GitHub PR inspectionInline diff review without leaving CodexAll desktop users
SSH remote devboxRun Codex against remote dev environmentsAlpha label — expect instability
Intel Mac supportFirst-time support; previously Apple Silicon only since Feb 2, 2026 desktop launchmacOS 13+

Adoption context: OpenAI reported 3 million weekly Codex developers with 70% month-over-month growth in the same announcement. The upgrade is aimed at that growth curve — every net-new feature here (computer use, plugins, memory, automations) is the kind of thing power users ask for once they're past basic coding usage.

The 90+ Plugin Marketplace

The plugin marketplace had a soft rollout on March 27, 2026; the April 16 release expanded it past 90. OpenAI's own copy says "more than 90" — TechCrunch-derived coverage cites 111. Either way, it is a broad surface.

Plugin architecture combines three things into single installable packages:

  • Skills — predefined prompt workflows for recurring tasks
  • App integrations — native actions in the target application, whether via API or GUI automation
  • MCP server configurations — Model Context Protocol servers bundled for installation in one step

Confirmed plugins across official and tier-1 coverage, grouped by category:

CategoryAgency-relevant plugins
DesignFigma (deep code-to-design integration), Adobe Creative Cloud
Docs / KnowledgeNotion, Box, Google Drive, Google Workspace (Gmail)
Dev workflowGitHub, GitLab Issues, CircleCI, Atlassian Rovo, Jira, Linear, Sentry, CodeRabbit, Hugging Face, Render
CommunicationSlack, Microsoft Teams, Microsoft Suite
Project managementTrello, Jira, Linear
DataSQL database connectors
SchedulingGoogle Calendar

Plugins install via /plugins in the terminal. Self-published and team-wide marketplaces are supported, which means agencies can ship internal plugin bundles to client teams — useful for repeatable audit workflows (Lighthouse runs, GA4 exports, Search Console pulls) that the agency wants consistent across engagements.

How Computer Use Actually Works

This is the area with the thinnest public documentation. OpenAI's own changelog describes Codex as operating "macOS apps by seeing, clicking, and typing with its own cursor." Third-party technical explainers characterize the underlying mechanism as screenshot-plus-vision: the agent reads the screen, interprets it with vision models, and generates click-and-type actions.

The contrast with Anthropic's Claude Code is architectural and worth knowing. Anthropic's published behavior: Claude "reaches for the most precise available tool first; if a connector exists with a structured API integration, Claude uses that connector because it is faster, more reliable, and less error-prone than navigating a visual interface." Codex defaults to screen interpretation. Claude defaults to API connectors when available and falls back to screen interpretation when nothing else exists.

AspectCodex DesktopClaude Code
Default action methodScreenshot + visionStructured API connector first; vision fallback
Execution modelBackground, parallel with userSession-based
Visual-acuity benchmark (Apr 2026)Not published98.5% (Opus 4.7) vs 54.5% (Opus 4.6)
Click-to-action latencyNot publishedNot published
Rate limitsNot publishedTier-dependent (Pro, Max, Enterprise)
macOS TCC permissions documentedNoNo

The practical upshot: for tasks where a clean API exists (GitHub PR actions, Linear ticket creation, Google Workspace document edits), Claude's tiered approach tends to be more reliable because the agent doesn't burn tokens interpreting a screen it didn't need to see. For tasks where no API exists (older desktop apps, niche design tools, legacy workflows), Codex's vision-first approach gets more work done simply because Claude's API-first fallback may not find a connector and degrade to vision anyway.

Where Claude Code Stands in April 2026

Claude Code shipped computer use on March 23, 2026 as a research preview for Pro and Max subscribers — roughly three weeks before Codex. Current April 2026 capabilities:

  • Model — Claude Opus 4.7 (launched Q1 2026 at the same $5 input / $25 output per MTok pricing as Opus 4.6). Visual-acuity benchmarks jumped from 54.5% on 4.6 to 98.5% on 4.7, which matters disproportionately for computer-use reliability.
  • Platforms— macOS, Windows, and Linux via the Claude Code CLI; desktop GUI through the Claude.ai app ("Cowork" integration).
  • Action hierarchy — structured API connectors preferred over GUI vision. Falls back to computer-use-style screen interpretation only when no connector exists.
  • Other Q1 2026 adds — Remote Control, Dispatch, Channels, Auto Mode, AutoDream, and /loopscheduled tasks. Claude has moved well past "terminal AI" in the same timeframe.

See our Claude Design analysis for the adjacent Anthropic product launched on April 17 — Opus 4.7 is the foundation for both Claude Code's computer use and Claude Design's visual prototyping capabilities.

Benchmark Head-to-Head

The most useful public comparison data is the February 2026 morphllm head-to-head (covering Codex and Claude Code against agent-coding benchmarks). Results:

BenchmarkCodex (GPT-5.3-Codex)Claude Opus 4.6Winner
SWE-bench Pro (public)56.8%55.4%Codex +1.4pp
SWE-bench Verified80.8%Claude only reported
Terminal-Bench 2.077.3%65.4%Codex +11.9pp
Blind code-quality preference33%67%Claude
Developer preference (DEV aggregation)30%70%Claude

Token efficiency goes strongly in Codex's favor:

Task (morphllm test)Codex tokensClaude tokensClaude cost multiple
Figma plugin build1,499K6,232K4.2×
Scheduler app72.5K234.7K3.2×
API integration~180K~650K3.6×

The tradeoff is consistent across the benchmark record. Claude produces better code and wins multi-file reasoning at a 3–4x token premium. Codex wins on terminal tasks and token efficiency. Blind preference goes to Claude when developers evaluate outputs without knowing which tool produced them.

Three Agency Tasks: What to Run Where

Translating the benchmark picture into agency-shaped workloads. No published benchmark covers these exact three tasks; the recommendations below apply the token-efficiency and code-quality patterns from morphllm to realistic agency use cases.

TaskBetter toolWhy
A. Python script: DataForSEO API → keyword CSVCodexSingle-file API integration; Codex ~3.6x cheaper per morphllm; Claude's quality edge matters less on straightforward API wiring
B. Markdown content brief with competitor screenshotsCodexIn-app browser + image generation make this a one-tool workflow; Claude needs a second tool for screenshots
C. Node.js GA4 CSV parser with edge-case handlingClaude CodeClaude's SWE-bench Verified edge + better debugging on tricky edge cases justifies the token premium
Multi-file refactor across 5+ componentsClaude CodeClaude wins blind code quality 67%; multi-file reasoning is its sweet spot
Automating recurring GUI workflow in Notion/FigmaCodexPlugin marketplace has native integrations for both; thread automations schedule recurring runs
Debugging a production incidentClaude CodeRemote Control, test iteration loop, and multi-file reasoning matter more than token efficiency at 2am

macOS Setup Flow

The fastest path from zero to a usable Codex Desktop install on a new Mac. OpenAI has not published a detailed setup guide for agencies; this reflects the practical steps based on the shipped app behavior.

  1. Download from chatgpt.com/codex. Direct download — not App Store. Requires macOS 13+.
  2. Sign in with a ChatGPT Plus ($20) or Pro ($200) account. Plus works but Pro is the power-user tier with ~10x usage for a limited time.
  3. Grant macOS TCC permissions. Accessibility, Screen Recording, and Automation are the likely trio based on the computer-use mechanism. OpenAI has not published the exact list — test in a sandbox account first.
  4. Install the first plugin via /plugins. Agency starter pack: GitHub, Slack, Figma, Notion, and the Google Workspace bundle.
  5. Enable memory (preview). Not available yet for Enterprise, Edu, EU, or UK accounts — rest of the world can opt in.
  6. Run a test task. Recommended: have Codex navigate to Figma, find a specific component, and export a screenshot. This exercises the full computer-use loop without touching production data.

Pricing Across ChatGPT Tiers

TierPriceCodex allowanceBest for
ChatGPT Plus$20/moBasic usage includedEvaluation; light automation
ChatGPT Pro$200/mo~10x Plus (limited-time promo)Agency power users; daily workflows
Business / EnterpriseCustomPay-as-you-go optionTeam deployments; audit trails
Codex API (developer)Metered per tokenSeparate from ChatGPT subscriptionsProgrammatic integrations

For comparison, Claude Max sits at $200/month and Claude Pro at $20/month — the tier pricing is almost identical at the top of the funnel. The real cost question is per-task token consumption: if a client workflow hits Codex 3–4x cheaper per task than Claude, that compounds meaningfully at scale. Conversely, if the workflow is dominated by multi-file debugging where Claude's quality edge saves developer hours, the premium pays for itself.

Failure Modes and Security Posture

Four primary risks identified across Help Net Security's April 17 analysis and secondary coverage:

RiskMechanismMitigation
Credential exposureCodex inherits full session privileges from logged-in appsDedicated macOS user account for Codex, isolated from primary work account
Unintended side effectsWrong-window click can send email, transfer funds, delete filesHuman-in-the-loop verification on destructive actions; no published dry-run mode
Plugin supply chain90+ third-party plugins, no published sandbox model for plugin actionsInstall only from OpenAI curated marketplace until self-published plugins have review history
Screen-content retentionVision-based computer use screenshots the user's screen; retention and transmission policy not documentedAudit OpenAI data-processing addendum; run sensitive workloads on Enterprise tier with explicit data controls

Pair Codex deployment with the runtime governance framework we covered in the Microsoft Agent Governance Toolkit analysis — deterministic policy enforcement at sub-millisecond latency is the right complement to an agent that can click arbitrary UI elements.

Decision Matrix: When to Use Which

Consolidating everything into a single per-use-case rubric:

Use casePrimary pickReason
Agency running heavy GUI automation (Figma, Notion, browsers)Codex DesktopPlugin marketplace + in-app browser + thread automations
Dev team doing multi-file refactors and incident responseClaude CodeBlind code-quality edge; stronger SWE-bench Verified
Token-budget-sensitive scripting at volumeCodex Desktop3–4x cheaper per task on morphllm benchmarks
EU / UK / Switzerland team needing computer use todayClaude CodeCodex computer use not available in those regions at launch
Linux dev environmentClaude CodeLinux CLI; Codex desktop has no Linux build
Marketing ops workflows with screenshots + copyCodex DesktopNative image generation in one tool; no round-trip
Security-sensitive enterprise engagementsClaude CodeTiered API-first approach; smaller vision-driven attack surface

Conclusion

OpenAI's April 16 Codex Desktop update was substantial — background computer use, 90+ plugin marketplace, in-app browser, native image generation, memory, and thread automations in a single release. The competitive framing is sharp: Codex is cheaper per task, faster on terminal workloads, and better positioned for GUI automation. Claude Code produces better code, wins multi-file reasoning, and has a safer default action hierarchy thanks to API-first fallback.

For agencies, the realistic answer is running both. Codex handles the automation and repeatable workflows that used to require shell scripts and Zapier glue. Claude Code handles the engineering work where quality matters more than token economics. The decision matrix above is the starting point; the right setup is workflow-specific and worth running an explicit pilot on before committing client infrastructure.

Deploy Codex and Claude Code Across Your Agency Stack

We run two-week pilots on both tools against real client workloads, establish the decision matrix, and deploy with runtime governance from day one.

Free consultation
Expert guidance
Tailored solutions

Frequently Asked Questions

Related Guides

More on coding agents, AI tool selection, and agency-side deployment playbooks.