GPT-5.5-Cyber is OpenAI’s most capable security model yet, and on June 22, 2026 the company released it in full as the centrepiece of an expanded Daybreak initiative — pairing a vendor-stated 85.6% on the CyberGym benchmark with Patch the Planet, a program that ships AI-discovered fixes to more than 30 open-source projects under Trail of Bits review.
The headline number is real but heavily caveated: OpenAI scored its own model, and the most capable tier is not available to the public. The more durable story sits underneath the benchmarks. OpenAI argues the security bottleneck has inverted — for years the hard part was finding vulnerabilities; now defenders are buried in findings and the hard part is patching them fast enough. Daybreak is built around that inversion.
This guide separates what actually shipped from what is marketing, recomputes the benchmark deltas, maps the three-tier access model so you can see where your organisation sits, and translates the SMB implication that most coverage skips: for small and mid-sized businesses, this capability arrives through your existing security vendor, not a direct OpenAI key.
- 01Daybreak expanded on June 22, 2026.OpenAI shipped four additions: the full release of GPT-5.5-Cyber (beyond its earlier permissive-only preview), an updated Codex Security plugin, the Daybreak Cyber Partner Program, and Patch the Planet. The initiative first launched May 12, 2026.
- 02The bottleneck moved from finding to patching.OpenAI’s framing is that AI now surfaces vulnerabilities faster than teams can remediate them, so the constraint is no longer discovery — it’s landing the fix. That reframing, not the raw benchmark, is the argument worth taking seriously.
- 03Every benchmark figure here is OpenAI’s own.The 85.6% CyberGym, 39.5% ExploitGym, and 69.8% SEC-bench Pro scores are self-reported and not independently audited. CyberGym is a UC Berkeley benchmark, but OpenAI ran the evaluation. Treat all figures as vendor-stated.
- 04The most capable models are gated, not public.GPT-5.5-Cyber sits behind Trusted Access for Cyber, and OpenAI’s comparison places Anthropic’s Mythos 5 (also restricted) close behind. The open frontier defenders can actually reach is base GPT-5.5 at a vendor-stated 81.8%.
- 05SMBs reach Daybreak through their security vendor.Direct model access stays with the 28 named partners — CrowdStrike, Sophos, SentinelOne, Cloudflare, and others. Smaller businesses encounter these capabilities embedded in the products they already run, not via a direct OpenAI API.
01 — What ShippedA May launch, a June expansion.
Daybreak first launched on May 12, 2026 as an initiative combining GPT-5.5, Codex Security, and Trusted Access for Cyber (TAC) to help organisations find and patch vulnerabilities before attackers exploit them. By May, OpenAI said hundreds of organisations and thousands of individual defenders were already enrolled in the TAC program.
June 22, 2026 was the substantive expansion. OpenAI shipped four things at once: the full release of GPT-5.5-Cyber — moving beyond the initial “permissive-only preview” — an updated Codex Security plugin, the Daybreak Cyber Partner Program with 28 named corporate partners, and Patch the Planet, an open-source patching effort run with Trail of Bits.
GPT-5.5-Cyber
OpenAI's most capable security model, now released in full beyond the earlier permissive-only preview. Distributed through continued limited release to trusted defenders — not the general public.
Codex Security plugin
Understands a team's code and threat model, flags plausible vulnerabilities, checks reachability, develops a targeted patch, and verifies the result — with humans deciding what to investigate and apply.
Cyber Partner Program
Accenture, Akamai, Cisco, Cloudflare, CrowdStrike, Darktrace, IBM, Okta, Palo Alto Networks, SentinelOne, Sophos, Wiz, Zscaler and more embed GPT-5.5 with Trusted Access for Cyber inside their own security products.
Patch the Planet
Co-founded with Trail of Bits, alongside HackerOne and Calif, to move open-source maintainers from findings to fixes. Expert human review precedes every finding that reaches a maintainer.
02 — The InversionThe bottleneck moved from finding to patching.
The single most useful idea in the whole announcement is not a number. It is a reframing of where security work actually gets stuck. For most of the last decade, the scarce skill was finding vulnerabilities — fuzzing harnesses, manual review, bug bounties. AI has changed the slope of that curve sharply enough that, in OpenAI’s telling, defenders are now drowning in findings and the real constraint has shifted downstream to remediation.
That claim is plausible and partly self-serving — a company selling patching tooling has every reason to declare patching the new frontier. But the supporting scale numbers are genuinely large. Since the Codex Security research preview in March 2026, OpenAI says the plugin has scanned more than 30 million commits across over 30,000 codebases. Human reviewers manually marked more than 70,000 findings as fixed, and a further 500,000-plus findings were automatically determined to be fixed. Whatever discount you apply for vendor framing, that is a volume of findings no human triage queue absorbs without help.
The forward-looking question this raises is uncomfortable for defenders. If AI keeps compounding the discovery rate while patch velocity stays human-paced, the gap between “known vulnerable” and “actually fixed” widens rather than closes — and that window is exactly where attackers operate. The bet embedded in Daybreak is that the same models can be pointed at the fix side fast enough to keep that window from blowing open. It is a bet, not a settled result.
"AI is already good and about to get super good at cybersecurity."— Sam Altman, CEO, OpenAI
03 — BenchmarksThe numbers, and why every one is vendor-stated.
OpenAI reports three benchmark results for GPT-5.5-Cyber, all measured against its own base model. On CyberGym — a UC Berkeley benchmark that tests whether an agent can reproduce 1,507 known software vulnerabilities from 188 open-source projects — the cyber variant posts a vendor-stated 85.6% against 81.8% for base GPT-5.5. On ExploitGym, which tests turning known vulnerabilities into working exploits, it reaches 39.5% versus 25.95%. On SEC-bench Pro, covering long-horizon vulnerability discovery and proof-of-concept generation, it scores 69.8% versus 63.1%.
CyberGym score · single-model, vendor-stated
Source: OpenAI (vendor-stated); Mythos 5 / Opus 4.7 figures per OpenAI's comparison, not independently auditedThe proprietary table below isolates the apples-to-apples comparison — cyber variant versus base, both scored by OpenAI on the same benchmarks — and recomputes the uplift in percentage points so the gains are not inflated by relative-percentage framing.
| Benchmark | GPT-5.5 (base) | GPT-5.5-Cyber | Uplift (pp) | What it measures |
|---|---|---|---|---|
| CyberGym | 81.8% | 85.6% | +3.8 | Reproducing known vulnerabilities |
| ExploitGym | 25.95% | 39.5% | +13.55 | Turning vulnerabilities into working exploits |
| SEC-bench Pro | 63.1% | 69.8% | +6.7 | Long-horizon discovery and PoC generation |
04 — The FieldBeating Mythos 5 — but both models stay gated.
OpenAI is making a public comparison: per its own CyberGym numbers, GPT-5.5-Cyber’s 85.6% edges Anthropic’s Claude Mythos 5 at 83.8%, while base GPT-5.5 (81.8%) and Claude Opus 4.7 (73.1%) sit below. Two cautions matter here. First, these cross-vendor figures are OpenAI-stated, not independently audited — the Mythos 5 and Opus 4.7 scores come from OpenAI’s comparison, not from Anthropic. Second, Mythos is not generally available; it is restricted to a small number of organisations under Anthropic’s rival Project Glasswing.
| Model | CyberGym | ExploitGym | SEC-bench Pro | Public access | Access path |
|---|---|---|---|---|---|
| GPT-5.5-Cyber | 85.6%* | 39.5%* | 69.8%* | No | Trusted Access for Cyber — verified defenders only |
| Claude Mythos 5 | 83.8%* | N/A | N/A | No | Project Glasswing — small set of cyber orgs |
| GPT-5.5 (baseline) | 81.8%* | 25.95%* | 63.1%* | Yes (API) | Standard OpenAI access |
| Claude Opus 4.7 | 73.1%* | N/A | N/A | Yes (API) | Standard Anthropic access |
* All figures OpenAI-stated; Mythos 5 and Opus 4.7 scores come from OpenAI’s comparison, not independently audited. ExploitGym and SEC-bench Pro results for the Anthropic models have not been published, so those cells are left as N/A rather than estimated.
This produces a genuine paradox. The two most capable security models on the board are both gated — GPT-5.5-Cyber behind Trusted Access for Cyber, Mythos behind Project Glasswing — converging on the same restricted-access philosophy. The frontier defenders and attackers can actually reach today is base GPT-5.5 at 81.8% and Opus 4.7 at 73.1%. Not everyone is convinced the gated tier changes the underlying picture. As SpecterOps CTO Jared Atkinson put it, AI will accelerate offensive security operations, but it does not fundamentally change the underlying problems defenders face. The capability is moving fast; the structural problems of patching, ownership, and coordination are not.
05 — Access TiersThree tiers of access — who gets what.
OpenAI describes the access model across several pages, but never as a single map. There are three tiers, and the distinction that trips people up is that GPT-5.5-Cyber is not the same thing as GPT-5.5 with Trusted Access for Cyber. The cyber variant is the most permissive, most tightly gated tier; TAC is the middle tier for standard enterprise defensive work.
| Tier | Gate | Who it’s for | SMB access path |
|---|---|---|---|
| GPT-5.5 (default) | None — standard OpenAI account | All developers — secure coding, review, triage, patch validation | Direct, via the Codex Security plugin |
| GPT-5.5 + Trusted Access | Application + identity verification; phishing-resistant auth required from June 1, 2026 | Cyber teams, security vendors, integrators, DevSecOps | Indirect — through partner vendor products |
| GPT-5.5-Cyber | Stricter verification, scoping, logging, ongoing review | Authorized red teams and penetration testers | Not typically available — enterprise/government path only |
OpenAI’s framing of why it gates at all is worth quoting in its own words: the company says it does not think it is practical or appropriate to centrally decide who gets to defend themselves. The tiering is the attempt to square broad defensive access with limiting the most offense-capable behaviour to verified, scoped, logged users.
06 — Patch the PlanetFrom findings to fixes, with humans in front.
Patch the Planet is the part of the announcement with the most concrete, checkable detail — and the design choice that distinguishes it. Co-founded with Trail of Bits, in collaboration with HackerOne and Calif, the program helps open-source maintainers move from findings to fixes. More than 30 projects have committed, including cURL, Go, Python, Sigstore, NATS Server, aiohttp, and python.org. The deliberate design principle: expert human review precedes every finding that reaches a maintainer. Trail of Bits engineers manually deduplicate, correct severity, and filter false positives before anything is submitted — the opposite of a raw AI bug-dump that floods maintainers faster than they can respond.
That matters because the people on the receiving end are stretched thin. OpenAI cites Harvard and Linux Foundation research finding that 94% of widely used open-source projects studied had fewer than ten developers responsible for more than 90% of the code added in a year. An AI that surfaces hundreds of issues into a one-maintainer project is a denial-of-service on attention unless something filters first.
LPE exploits + 8 leak PoCs
GPT-5.5-Cyber analyzed 30M+ lines of kernel code, generating 24 local privilege-escalation exploits and 8 pointer information-leak proof-of-concepts from hundreds of flagged potential issues.
vulnerabilities confirmed
OpenAI researchers confirmed 34 FreeBSD vulnerabilities and produced 7 local privilege-escalation PoCs, with CVE disclosures documented on freebsd.org.
exploitable bugs reported
Five exploitable vulnerabilities found in Chrome's V8 JavaScript engine; three were identified and remediated within days of being introduced into the codebase.
bugs in roughly a week
More than ten exploitable WebKit vulnerabilities found and reported during roughly one week of focused work — a pace that is the whole point of the patching-bottleneck argument.
The single most vivid example does not require parsing a benchmark. During its own safety evaluations, GPT-5.5 — the base model, not even the cyber variant — identified a WebAssembly vulnerability in Firefox, recorded as CVE-2026-8390. Mozilla patched it two days before Pwn2Own Berlin. Five of the six registered Firefox competition entries withdrew, and no Firefox exploit was successfully demonstrated at the event. A model found a real, exploitable browser bug during routine testing and quietly took an entire competition track off the board.
Other findings round out the picture. AI models identified a 23-year-old use-after-free in OpenBSD’s kernel implementation of System V semaphores, confirmed to let an unprivileged local user escalate to root. Calif used Codex to discover an HTTP/2 denial-of-service technique affecting major server software including NGINX, Apache, IIS, and Pingora; its analysis estimated more than 880,000 internet-facing websites were running affected software with HTTP/2 enabled. And Codex Security independently identified vulnerable patterns corresponding to four of the six dnsmasq CVEs fixed in release 2.92rel2. Trail of Bits also reported building a complete fuzzing lab in less than a day using GPT-5.5-Cyber — work it estimates would ordinarily take at least several weeks by hand.
07 — ImplicationsWhat it means for SMBs and agencies.
The practical takeaway most coverage skips: for small and mid-sized businesses, Daybreak is not something you buy directly. The Cyber Partner Program is explicitly architectural — OpenAI routes capabilities through 28 partners who embed them inside their own products, keeping direct model access in the hands of those partners. If you run CrowdStrike, Sophos, SentinelOne, Cloudflare, or one of the other named vendors, you will encounter this AI as a feature of tools you already pay for, not as an API key you provision. That is the same build-vs-buy decision for AI-assisted workflows playing out in security: the realistic path for most teams is buy, through the stack they already operate.
Start with the default tier
OpenAI's own guidance is that GPT-5.5 with Trusted Access for Cyber and Codex Security is the right starting point for most defenders. The base tier needs no special gate — secure coding, review, and patch validation are available now.
Access arrives through your security vendor
Direct model access stays with the 28 partners. If you use CrowdStrike, Sophos, or SentinelOne, the capability reaches you embedded in those products — not via a direct OpenAI key. Ask your vendor what they have integrated.
Apply for Trusted Access for Cyber
Advanced defensive work — triage, malware analysis, detection engineering, incident response — needs the middle tier, which requires application, identity verification, and phishing-resistant auth from June 1, 2026.
GPT-5.5-Cyber is gated tightest
The cyber variant is reserved for authorized red teams and penetration testers under stricter verification, scoping, and logging. There is no self-serve path; this is an enterprise and government channel.
For agencies and engineering teams, the strategic read is that defensive tooling is becoming a model-routing question rather than a single-vendor choice. The same discipline we bring to agentic security risks applies here: decide which workloads justify gated access, which are well served by base GPT-5.5, and which belong inside a partner product you already trust. Pair that with hands-on hygiene — the kind of account security audit practices that close the gaps no model patches for you. If you are weighing where AI-assisted security fits in your own builds, our secure web development engagements and AI digital transformation programs start with exactly this kind of routing and governance decision. The named partners — including Accenture’s AI security partnerships — show how the integration layer is already forming.
08 — ConclusionThe capability is here; the distribution is the question.
Finding bugs got cheap. Landing the fix is the new frontier.
GPT-5.5-Cyber and the expanded Daybreak initiative are a real step in AI-assisted security — but the durable insight is the reframing, not the leaderboard. When models surface vulnerabilities faster than teams can remediate them, the constraint moves to patching, and Patch the Planet is OpenAI’s attempt to put a human-reviewed pipeline around that shift.
Hold the benchmarks at arm’s length. The 85.6% CyberGym, 39.5% ExploitGym, and 69.8% SEC-bench Pro figures are OpenAI’s own, unaudited, and the cross-vendor comparisons against Mythos 5 and Opus 4.7 come from OpenAI rather than Anthropic. Even taken at face value, a 39.5% exploit-generation score means the model fails most of those tasks. This is a force multiplier for defenders, in Cisco’s framing — not an autonomous patching machine.
The forward signal is about distribution, not capability. The two strongest security models on the board are both gated, the open frontier defenders can reach is base GPT-5.5, and for most businesses the capability arrives indirectly through a security vendor. The winning move is not chasing the gated tier — it is deciding, workload by workload, where AI-assisted security belongs in a stack you already run, and making sure the patch actually ships.