OpenAI Codex Desktop: Computer Use + 90+ App Plugins
OpenAI Codex Desktop v26.415 shipped April 16 with background computer use and 90+ plugins. Figma, Notion, GitHub. Benchmark head-to-head vs Claude Code.
Released
Plugins
Weekly Users
MoM Growth
Key Takeaways
On April 16, 2026, OpenAI shipped Codex Desktop v26.415 under the title "Codex for (almost) everything."The release brings background computer use, a 90+ plugin marketplace, an in-app browser built on OpenAI's Atlas technology, gpt-image-1.5 image generation, persistent memory, and thread automations that resume across days or weeks.
The framing matters. Codex Desktop is not the Codex CLI we benchmarked against OpenClaw and Hermes on April 18 — it is a GUI application with computer-use capability that drives Figma, Notion, browsers, and any other macOS app visually. This post covers what shipped, how it compares to Claude Code (which shipped similar capabilities on March 23), which tasks run better on which tool, and the setup and security posture agencies should know before deploying it.
OpenAI, verbatim:"Codex can now operate your computer alongside you, work with more of the tools and apps you use every day, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work."
What Shipped on April 16
The April 16 changelog lists nine substantial feature additions:
| Feature | What it does | Regional or platform caveat |
|---|---|---|
| Background computer use | Agents click and type across desktop apps in parallel, without hijacking the foreground session | macOS only; not available in EEA, UK, Switzerland |
| In-app browser (Atlas tech) | Browse and comment on pages to instruct the agent; localhost preview support | All desktop users |
| Image generation | gpt-image-1.5 baked in for mockups, assets, concepts — no ChatGPT round-trip | All paying tiers |
| Memory (preview) | Persists preferences, tech stacks, recurring workflows across threads | Enterprise / Edu / EU / UK rollout delayed |
| Thread automations | Schedule a thread to wake up later and resume work across days or weeks | All desktop users |
| 90+ plugin marketplace | Curated packages that bundle skills, app integrations, MCP servers | All desktop users |
| GitHub PR inspection | Inline diff review without leaving Codex | All desktop users |
| SSH remote devbox | Run Codex against remote dev environments | Alpha label — expect instability |
| Intel Mac support | First-time support; previously Apple Silicon only since Feb 2, 2026 desktop launch | macOS 13+ |
Adoption context: OpenAI reported 3 million weekly Codex developers with 70% month-over-month growth in the same announcement. The upgrade is aimed at that growth curve — every net-new feature here (computer use, plugins, memory, automations) is the kind of thing power users ask for once they're past basic coding usage.
The 90+ Plugin Marketplace
The plugin marketplace had a soft rollout on March 27, 2026; the April 16 release expanded it past 90. OpenAI's own copy says "more than 90" — TechCrunch-derived coverage cites 111. Either way, it is a broad surface.
Plugin architecture combines three things into single installable packages:
- Skills — predefined prompt workflows for recurring tasks
- App integrations — native actions in the target application, whether via API or GUI automation
- MCP server configurations — Model Context Protocol servers bundled for installation in one step
Confirmed plugins across official and tier-1 coverage, grouped by category:
| Category | Agency-relevant plugins |
|---|---|
| Design | Figma (deep code-to-design integration), Adobe Creative Cloud |
| Docs / Knowledge | Notion, Box, Google Drive, Google Workspace (Gmail) |
| Dev workflow | GitHub, GitLab Issues, CircleCI, Atlassian Rovo, Jira, Linear, Sentry, CodeRabbit, Hugging Face, Render |
| Communication | Slack, Microsoft Teams, Microsoft Suite |
| Project management | Trello, Jira, Linear |
| Data | SQL database connectors |
| Scheduling | Google Calendar |
Plugins install via /plugins in the terminal. Self-published and team-wide marketplaces are supported, which means agencies can ship internal plugin bundles to client teams — useful for repeatable audit workflows (Lighthouse runs, GA4 exports, Search Console pulls) that the agency wants consistent across engagements.
How Computer Use Actually Works
This is the area with the thinnest public documentation. OpenAI's own changelog describes Codex as operating "macOS apps by seeing, clicking, and typing with its own cursor." Third-party technical explainers characterize the underlying mechanism as screenshot-plus-vision: the agent reads the screen, interprets it with vision models, and generates click-and-type actions.
The contrast with Anthropic's Claude Code is architectural and worth knowing. Anthropic's published behavior: Claude "reaches for the most precise available tool first; if a connector exists with a structured API integration, Claude uses that connector because it is faster, more reliable, and less error-prone than navigating a visual interface." Codex defaults to screen interpretation. Claude defaults to API connectors when available and falls back to screen interpretation when nothing else exists.
| Aspect | Codex Desktop | Claude Code |
|---|---|---|
| Default action method | Screenshot + vision | Structured API connector first; vision fallback |
| Execution model | Background, parallel with user | Session-based |
| Visual-acuity benchmark (Apr 2026) | Not published | 98.5% (Opus 4.7) vs 54.5% (Opus 4.6) |
| Click-to-action latency | Not published | Not published |
| Rate limits | Not published | Tier-dependent (Pro, Max, Enterprise) |
| macOS TCC permissions documented | No | No |
The practical upshot: for tasks where a clean API exists (GitHub PR actions, Linear ticket creation, Google Workspace document edits), Claude's tiered approach tends to be more reliable because the agent doesn't burn tokens interpreting a screen it didn't need to see. For tasks where no API exists (older desktop apps, niche design tools, legacy workflows), Codex's vision-first approach gets more work done simply because Claude's API-first fallback may not find a connector and degrade to vision anyway.
Where Claude Code Stands in April 2026
Claude Code shipped computer use on March 23, 2026 as a research preview for Pro and Max subscribers — roughly three weeks before Codex. Current April 2026 capabilities:
- Model — Claude Opus 4.7 (launched Q1 2026 at the same $5 input / $25 output per MTok pricing as Opus 4.6). Visual-acuity benchmarks jumped from 54.5% on 4.6 to 98.5% on 4.7, which matters disproportionately for computer-use reliability.
- Platforms— macOS, Windows, and Linux via the Claude Code CLI; desktop GUI through the Claude.ai app ("Cowork" integration).
- Action hierarchy — structured API connectors preferred over GUI vision. Falls back to computer-use-style screen interpretation only when no connector exists.
- Other Q1 2026 adds — Remote Control, Dispatch, Channels, Auto Mode, AutoDream, and
/loopscheduled tasks. Claude has moved well past "terminal AI" in the same timeframe.
See our Claude Design analysis for the adjacent Anthropic product launched on April 17 — Opus 4.7 is the foundation for both Claude Code's computer use and Claude Design's visual prototyping capabilities.
Benchmark Head-to-Head
The most useful public comparison data is the February 2026 morphllm head-to-head (covering Codex and Claude Code against agent-coding benchmarks). Results:
| Benchmark | Codex (GPT-5.3-Codex) | Claude Opus 4.6 | Winner |
|---|---|---|---|
| SWE-bench Pro (public) | 56.8% | 55.4% | Codex +1.4pp |
| SWE-bench Verified | — | 80.8% | Claude only reported |
| Terminal-Bench 2.0 | 77.3% | 65.4% | Codex +11.9pp |
| Blind code-quality preference | 33% | 67% | Claude |
| Developer preference (DEV aggregation) | 30% | 70% | Claude |
Token efficiency goes strongly in Codex's favor:
| Task (morphllm test) | Codex tokens | Claude tokens | Claude cost multiple |
|---|---|---|---|
| Figma plugin build | 1,499K | 6,232K | 4.2× |
| Scheduler app | 72.5K | 234.7K | 3.2× |
| API integration | ~180K | ~650K | 3.6× |
The tradeoff is consistent across the benchmark record. Claude produces better code and wins multi-file reasoning at a 3–4x token premium. Codex wins on terminal tasks and token efficiency. Blind preference goes to Claude when developers evaluate outputs without knowing which tool produced them.
Three Agency Tasks: What to Run Where
Translating the benchmark picture into agency-shaped workloads. No published benchmark covers these exact three tasks; the recommendations below apply the token-efficiency and code-quality patterns from morphllm to realistic agency use cases.
| Task | Better tool | Why |
|---|---|---|
| A. Python script: DataForSEO API → keyword CSV | Codex | Single-file API integration; Codex ~3.6x cheaper per morphllm; Claude's quality edge matters less on straightforward API wiring |
| B. Markdown content brief with competitor screenshots | Codex | In-app browser + image generation make this a one-tool workflow; Claude needs a second tool for screenshots |
| C. Node.js GA4 CSV parser with edge-case handling | Claude Code | Claude's SWE-bench Verified edge + better debugging on tricky edge cases justifies the token premium |
| Multi-file refactor across 5+ components | Claude Code | Claude wins blind code quality 67%; multi-file reasoning is its sweet spot |
| Automating recurring GUI workflow in Notion/Figma | Codex | Plugin marketplace has native integrations for both; thread automations schedule recurring runs |
| Debugging a production incident | Claude Code | Remote Control, test iteration loop, and multi-file reasoning matter more than token efficiency at 2am |
Most agencies should run both.They're not duplicative. Our AI digital transformation engagements typically end up with Codex for automation and repeatable workflows, Claude Code for deep engineering and incident response, and an explicit decision rubric per workflow.
macOS Setup Flow
The fastest path from zero to a usable Codex Desktop install on a new Mac. OpenAI has not published a detailed setup guide for agencies; this reflects the practical steps based on the shipped app behavior.
- Download from chatgpt.com/codex. Direct download — not App Store. Requires macOS 13+.
- Sign in with a ChatGPT Plus ($20) or Pro ($200) account. Plus works but Pro is the power-user tier with ~10x usage for a limited time.
- Grant macOS TCC permissions. Accessibility, Screen Recording, and Automation are the likely trio based on the computer-use mechanism. OpenAI has not published the exact list — test in a sandbox account first.
- Install the first plugin via
/plugins. Agency starter pack: GitHub, Slack, Figma, Notion, and the Google Workspace bundle. - Enable memory (preview). Not available yet for Enterprise, Edu, EU, or UK accounts — rest of the world can opt in.
- Run a test task. Recommended: have Codex navigate to Figma, find a specific component, and export a screenshot. This exercises the full computer-use loop without touching production data.
Run Codex in a dedicated macOS user accountfor any agency deployment. Session-privilege inheritance is the main security risk — a wrong click in the operator's real account can hit live client data, send real email, or push to real Git branches.
Pricing Across ChatGPT Tiers
| Tier | Price | Codex allowance | Best for |
|---|---|---|---|
| ChatGPT Plus | $20/mo | Basic usage included | Evaluation; light automation |
| ChatGPT Pro | $200/mo | ~10x Plus (limited-time promo) | Agency power users; daily workflows |
| Business / Enterprise | Custom | Pay-as-you-go option | Team deployments; audit trails |
| Codex API (developer) | Metered per token | Separate from ChatGPT subscriptions | Programmatic integrations |
For comparison, Claude Max sits at $200/month and Claude Pro at $20/month — the tier pricing is almost identical at the top of the funnel. The real cost question is per-task token consumption: if a client workflow hits Codex 3–4x cheaper per task than Claude, that compounds meaningfully at scale. Conversely, if the workflow is dominated by multi-file debugging where Claude's quality edge saves developer hours, the premium pays for itself.
Failure Modes and Security Posture
Four primary risks identified across Help Net Security's April 17 analysis and secondary coverage:
| Risk | Mechanism | Mitigation |
|---|---|---|
| Credential exposure | Codex inherits full session privileges from logged-in apps | Dedicated macOS user account for Codex, isolated from primary work account |
| Unintended side effects | Wrong-window click can send email, transfer funds, delete files | Human-in-the-loop verification on destructive actions; no published dry-run mode |
| Plugin supply chain | 90+ third-party plugins, no published sandbox model for plugin actions | Install only from OpenAI curated marketplace until self-published plugins have review history |
| Screen-content retention | Vision-based computer use screenshots the user's screen; retention and transmission policy not documented | Audit OpenAI data-processing addendum; run sensitive workloads on Enterprise tier with explicit data controls |
Pair Codex deployment with the runtime governance framework we covered in the Microsoft Agent Governance Toolkit analysis — deterministic policy enforcement at sub-millisecond latency is the right complement to an agent that can click arbitrary UI elements.
Decision Matrix: When to Use Which
Consolidating everything into a single per-use-case rubric:
| Use case | Primary pick | Reason |
|---|---|---|
| Agency running heavy GUI automation (Figma, Notion, browsers) | Codex Desktop | Plugin marketplace + in-app browser + thread automations |
| Dev team doing multi-file refactors and incident response | Claude Code | Blind code-quality edge; stronger SWE-bench Verified |
| Token-budget-sensitive scripting at volume | Codex Desktop | 3–4x cheaper per task on morphllm benchmarks |
| EU / UK / Switzerland team needing computer use today | Claude Code | Codex computer use not available in those regions at launch |
| Linux dev environment | Claude Code | Linux CLI; Codex desktop has no Linux build |
| Marketing ops workflows with screenshots + copy | Codex Desktop | Native image generation in one tool; no round-trip |
| Security-sensitive enterprise engagements | Claude Code | Tiered API-first approach; smaller vision-driven attack surface |
Conclusion
OpenAI's April 16 Codex Desktop update was substantial — background computer use, 90+ plugin marketplace, in-app browser, native image generation, memory, and thread automations in a single release. The competitive framing is sharp: Codex is cheaper per task, faster on terminal workloads, and better positioned for GUI automation. Claude Code produces better code, wins multi-file reasoning, and has a safer default action hierarchy thanks to API-first fallback.
For agencies, the realistic answer is running both. Codex handles the automation and repeatable workflows that used to require shell scripts and Zapier glue. Claude Code handles the engineering work where quality matters more than token economics. The decision matrix above is the starting point; the right setup is workflow-specific and worth running an explicit pilot on before committing client infrastructure.
Deploy Codex and Claude Code Across Your Agency Stack
We run two-week pilots on both tools against real client workloads, establish the decision matrix, and deploy with runtime governance from day one.
Frequently Asked Questions
Related Guides
More on coding agents, AI tool selection, and agency-side deployment playbooks.