AI Development13 min read

OpenAI Codex Desktop: Computer Use + 90+ App Plugins

OpenAI Codex Desktop v26.415 shipped April 16 with background computer use and 90+ plugins. Figma, Notion, GitHub. Benchmark head-to-head vs Claude Code.

Digital Applied Team

April 19, 2026

13 min read

Apr 16

Released

90+

Plugins

Weekly Users

+70%

MoM Growth

Key Takeaways

Codex Desktop v26.415 Is the Big Jump: April 16 release adds background computer use, 90+ plugin marketplace, in-app browser (Atlas), gpt-image-1.5, memory preview, and thread automations.

Computer Use Is Not in EEA, UK, or Switzerland: Regional carveout at launch. Computer use is macOS-only for now — Windows and Linux timelines are undated.

Codex Is Cheaper, Claude Produces Better Code: Feb 2026 benchmarks: Claude uses 3–4x more tokens per task but wins 67% of blind code-quality comparisons. Codex leads Terminal-Bench (77.3% vs 65.4%).

Plugins Bundle Skills + Apps + MCP: Single installable packages combining prompt workflows, GUI integrations, and MCP servers. Install via /plugins. Self-publish coming.

Credential Exposure Is the Real Risk: Codex drives logged-in apps with full session privileges. Mitigate with a dedicated macOS user account isolated from the operator's primary work account.

On April 16, 2026, OpenAI shipped Codex Desktop v26.415 under the title "Codex for (almost) everything." The release brings background computer use, a 90+ plugin marketplace, an in-app browser built on OpenAI's Atlas technology, gpt-image-1.5 image generation, persistent memory, and thread automations that resume across days or weeks.

The framing matters. Codex Desktop is not the Codex CLI we benchmarked against OpenClaw and Hermes on April 18 — it is a GUI application with computer-use capability that drives Figma, Notion, browsers, and any other macOS app visually. This post covers what shipped, how it compares to Claude Code (which shipped similar capabilities on March 23), which tasks run better on which tool, and the setup and security posture agencies should know before deploying it.

OpenAI, verbatim: "Codex can now operate your computer alongside you, work with more of the tools and apps you use every day, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work."

What Shipped on April 16

The April 16 changelog lists nine substantial feature additions:

Feature	What it does	Regional or platform caveat
Background computer use	Agents click and type across desktop apps in parallel, without hijacking the foreground session	macOS only; not available in EEA, UK, Switzerland
In-app browser (Atlas tech)	Browse and comment on pages to instruct the agent; localhost preview support	All desktop users
Image generation	`gpt-image-1.5` baked in for mockups, assets, concepts — no ChatGPT round-trip	All paying tiers
Memory (preview)	Persists preferences, tech stacks, recurring workflows across threads	Enterprise / Edu / EU / UK rollout delayed
Thread automations	Schedule a thread to wake up later and resume work across days or weeks	All desktop users
90+ plugin marketplace	Curated packages that bundle skills, app integrations, MCP servers	All desktop users
GitHub PR inspection	Inline diff review without leaving Codex	All desktop users
SSH remote devbox	Run Codex against remote dev environments	Alpha label — expect instability
Intel Mac support	First-time support; previously Apple Silicon only since Feb 2, 2026 desktop launch	macOS 13+

Adoption context: OpenAI reported 3 million weekly Codex developers with 70% month-over-month growth in the same announcement. The upgrade is aimed at that growth curve — every net-new feature here (computer use, plugins, memory, automations) is the kind of thing power users ask for once they're past basic coding usage.

The 90+ Plugin Marketplace

The plugin marketplace had a soft rollout on March 27, 2026; the April 16 release expanded it past 90. OpenAI's own copy says "more than 90" — TechCrunch-derived coverage cites 111. Either way, it is a broad surface.

Plugin architecture combines three things into single installable packages:

Skills — predefined prompt workflows for recurring tasks
App integrations — native actions in the target application, whether via API or GUI automation
MCP server configurations — Model Context Protocol servers bundled for installation in one step

Confirmed plugins across official and tier-1 coverage, grouped by category:

Category	Agency-relevant plugins
Design	Figma (deep code-to-design integration), Adobe Creative Cloud
Docs / Knowledge	Notion, Box, Google Drive, Google Workspace (Gmail)
Dev workflow	GitHub, GitLab Issues, CircleCI, Atlassian Rovo, Jira, Linear, Sentry, CodeRabbit, Hugging Face, Render
Communication	Slack, Microsoft Teams, Microsoft Suite
Project management	Trello, Jira, Linear
Data	SQL database connectors
Scheduling	Google Calendar

Plugins install via /plugins in the terminal. Self-published and team-wide marketplaces are supported, which means agencies can ship internal plugin bundles to client teams — useful for repeatable audit workflows (Lighthouse runs, GA4 exports, Search Console pulls) that the agency wants consistent across engagements.

How Computer Use Actually Works

This is the area with the thinnest public documentation. OpenAI's own changelog describes Codex as operating "macOS apps by seeing, clicking, and typing with its own cursor." Third-party technical explainers characterize the underlying mechanism as screenshot-plus-vision: the agent reads the screen, interprets it with vision models, and generates click-and-type actions.

The contrast with Anthropic's Claude Code is architectural and worth knowing. Anthropic's published behavior: Claude "reaches for the most precise available tool first; if a connector exists with a structured API integration, Claude uses that connector because it is faster, more reliable, and less error-prone than navigating a visual interface." Codex defaults to screen interpretation. Claude defaults to API connectors when available and falls back to screen interpretation when nothing else exists.

Aspect	Codex Desktop	Claude Code
Default action method	Screenshot + vision	Structured API connector first; vision fallback
Execution model	Background, parallel with user	Session-based
Visual-acuity benchmark (Apr 2026)	Not published	98.5% (Opus 4.7) vs 54.5% (Opus 4.6)
Click-to-action latency	Not published	Not published
Rate limits	Not published	Tier-dependent (Pro, Max, Enterprise)
macOS TCC permissions documented	No	No

The practical upshot: for tasks where a clean API exists (GitHub PR actions, Linear ticket creation, Google Workspace document edits), Claude's tiered approach tends to be more reliable because the agent doesn't burn tokens interpreting a screen it didn't need to see. For tasks where no API exists (older desktop apps, niche design tools, legacy workflows), Codex's vision-first approach gets more work done simply because Claude's API-first fallback may not find a connector and degrade to vision anyway.

Where Claude Code Stands in April 2026

Claude Code shipped computer use on March 23, 2026 as a research preview for Pro and Max subscribers — roughly three weeks before Codex. Current April 2026 capabilities:

Model — Claude Opus 4.7 (launched Q1 2026 at the same $5 input / $25 output per MTok pricing as Opus 4.6). Visual-acuity benchmarks jumped from 54.5% on 4.6 to 98.5% on 4.7, which matters disproportionately for computer-use reliability.
Platforms — macOS, Windows, and Linux via the Claude Code CLI; desktop GUI through the Claude.ai app ("Cowork" integration).
Action hierarchy — structured API connectors preferred over GUI vision. Falls back to computer-use-style screen interpretation only when no connector exists.
Other Q1 2026 adds — Remote Control, Dispatch, Channels, Auto Mode, AutoDream, and /loop scheduled tasks. Claude has moved well past "terminal AI" in the same timeframe.

See our Claude Design analysis for the adjacent Anthropic product launched on April 17 — Opus 4.7 is the foundation for both Claude Code's computer use and Claude Design's visual prototyping capabilities.

Benchmark Head-to-Head

The most useful public comparison data is the February 2026 morphllm head-to-head (covering Codex and Claude Code against agent-coding benchmarks). Results:

Benchmark	Codex (GPT-5.3-Codex)	Claude Opus 4.6	Winner
SWE-bench Pro (public)	56.8%	55.4%	Codex +1.4pp
SWE-bench Verified	—	80.8%	Claude only reported
Terminal-Bench 2.0	77.3%	65.4%	Codex +11.9pp
Blind code-quality preference	33%	67%	Claude
Developer preference (DEV aggregation)	30%	70%	Claude

Token efficiency goes strongly in Codex's favor:

Task (morphllm test)	Codex tokens	Claude tokens	Claude cost multiple
Figma plugin build	1,499K	6,232K	4.2×
Scheduler app	72.5K	234.7K	3.2×
API integration	~180K	~650K	3.6×

The tradeoff is consistent across the benchmark record. Claude produces better code and wins multi-file reasoning at a 3–4x token premium. Codex wins on terminal tasks and token efficiency. Blind preference goes to Claude when developers evaluate outputs without knowing which tool produced them.

Three Agency Tasks: What to Run Where

Translating the benchmark picture into agency-shaped workloads. No published benchmark covers these exact three tasks; the recommendations below apply the token-efficiency and code-quality patterns from morphllm to realistic agency use cases.

Task	Better tool	Why
A. Python script: DataForSEO API → keyword CSV	Codex	Single-file API integration; Codex ~3.6x cheaper per morphllm; Claude's quality edge matters less on straightforward API wiring
B. Markdown content brief with competitor screenshots	Codex	In-app browser + image generation make this a one-tool workflow; Claude needs a second tool for screenshots
C. Node.js GA4 CSV parser with edge-case handling	Claude Code	Claude's SWE-bench Verified edge + better debugging on tricky edge cases justifies the token premium
Multi-file refactor across 5+ components	Claude Code	Claude wins blind code quality 67%; multi-file reasoning is its sweet spot
Automating recurring GUI workflow in Notion/Figma	Codex	Plugin marketplace has native integrations for both; thread automations schedule recurring runs
Debugging a production incident	Claude Code	Remote Control, test iteration loop, and multi-file reasoning matter more than token efficiency at 2am

Most agencies should run both. They're not duplicative. Our AI digital transformation engagements typically end up with Codex for automation and repeatable workflows, Claude Code for deep engineering and incident response, and an explicit decision rubric per workflow.

macOS Setup Flow

The fastest path from zero to a usable Codex Desktop install on a new Mac. OpenAI has not published a detailed setup guide for agencies; this reflects the practical steps based on the shipped app behavior.

Download from chatgpt.com/codex. Direct download — not App Store. Requires macOS 13+.
Sign in with a ChatGPT Plus ($20) or Pro ($200) account. Plus works but Pro is the power-user tier with ~10x usage for a limited time.
Grant macOS TCC permissions. Accessibility, Screen Recording, and Automation are the likely trio based on the computer-use mechanism. OpenAI has not published the exact list — test in a sandbox account first.
Install the first plugin via /plugins. Agency starter pack: GitHub, Slack, Figma, Notion, and the Google Workspace bundle.
Enable memory (preview). Not available yet for Enterprise, Edu, EU, or UK accounts — rest of the world can opt in.
Run a test task. Recommended: have Codex navigate to Figma, find a specific component, and export a screenshot. This exercises the full computer-use loop without touching production data.

Run Codex in a dedicated macOS user account for any agency deployment. Session-privilege inheritance is the main security risk — a wrong click in the operator's real account can hit live client data, send real email, or push to real Git branches.

Pricing Across ChatGPT Tiers

Tier	Price	Codex allowance	Best for
ChatGPT Plus	$20/mo	Basic usage included	Evaluation; light automation
ChatGPT Pro	$200/mo	~10x Plus (limited-time promo)	Agency power users; daily workflows
Business / Enterprise	Custom	Pay-as-you-go option	Team deployments; audit trails
Codex API (developer)	Metered per token	Separate from ChatGPT subscriptions	Programmatic integrations

For comparison, Claude Max sits at $200/month and Claude Pro at $20/month — the tier pricing is almost identical at the top of the funnel. The real cost question is per-task token consumption: if a client workflow hits Codex 3–4x cheaper per task than Claude, that compounds meaningfully at scale. Conversely, if the workflow is dominated by multi-file debugging where Claude's quality edge saves developer hours, the premium pays for itself.

Failure Modes and Security Posture

Four primary risks identified across Help Net Security's April 17 analysis and secondary coverage:

Risk	Mechanism	Mitigation
Credential exposure	Codex inherits full session privileges from logged-in apps	Dedicated macOS user account for Codex, isolated from primary work account
Unintended side effects	Wrong-window click can send email, transfer funds, delete files	Human-in-the-loop verification on destructive actions; no published dry-run mode
Plugin supply chain	90+ third-party plugins, no published sandbox model for plugin actions	Install only from OpenAI curated marketplace until self-published plugins have review history
Screen-content retention	Vision-based computer use screenshots the user's screen; retention and transmission policy not documented	Audit OpenAI data-processing addendum; run sensitive workloads on Enterprise tier with explicit data controls

Pair Codex deployment with the runtime governance framework we covered in the Microsoft Agent Governance Toolkit analysis — deterministic policy enforcement at sub-millisecond latency is the right complement to an agent that can click arbitrary UI elements.

Decision Matrix: When to Use Which

Consolidating everything into a single per-use-case rubric:

Use case	Primary pick	Reason
Agency running heavy GUI automation (Figma, Notion, browsers)	Codex Desktop	Plugin marketplace + in-app browser + thread automations
Dev team doing multi-file refactors and incident response	Claude Code	Blind code-quality edge; stronger SWE-bench Verified
Token-budget-sensitive scripting at volume	Codex Desktop	3–4x cheaper per task on morphllm benchmarks
EU / UK / Switzerland team needing computer use today	Claude Code	Codex computer use not available in those regions at launch
Linux dev environment	Claude Code	Linux CLI; Codex desktop has no Linux build
Marketing ops workflows with screenshots + copy	Codex Desktop	Native image generation in one tool; no round-trip
Security-sensitive enterprise engagements	Claude Code	Tiered API-first approach; smaller vision-driven attack surface

Conclusion

OpenAI's April 16 Codex Desktop update was substantial — background computer use, 90+ plugin marketplace, in-app browser, native image generation, memory, and thread automations in a single release. The competitive framing is sharp: Codex is cheaper per task, faster on terminal workloads, and better positioned for GUI automation. Claude Code produces better code, wins multi-file reasoning, and has a safer default action hierarchy thanks to API-first fallback.

For agencies, the realistic answer is running both. Codex handles the automation and repeatable workflows that used to require shell scripts and Zapier glue. Claude Code handles the engineering work where quality matters more than token economics. The decision matrix above is the starting point; the right setup is workflow-specific and worth running an explicit pilot on before committing client infrastructure.