AI DevelopmentNew Release11 min readPublished June 27, 2026

Vendor-stated 85.6% on CyberGym · 30+ open-source projects · verified defenders only

GPT-5.5-Cyber & Daybreak: AI That Now Patches Code

On June 22, 2026, OpenAI expanded its Daybreak security initiative with the full release of GPT-5.5-Cyber, an updated Codex Security plugin, a 28-partner program, and Patch the Planet. The reframing matters more than the benchmarks: finding bugs is no longer the hard part — shipping the fix fast enough is. Every score below is OpenAI’s own, and the most capable model stays gated.

DA
Digital Applied Team
Senior strategists · Published Jun 27, 2026
PublishedJune 27, 2026
Read time11 min
SourcesOpenAI + independents
GPT-5.5-Cyber · CyberGym
85.6%
OpenAI-stated SOTA
+3.8 vs GPT-5.5
GPT-5.5-Cyber · ExploitGym
39.5%
exploit generation
+52% vs GPT-5.5
Patch the Planet
30+
open-source projects
Cyber Partner Program
28
named partners

GPT-5.5-Cyber is OpenAI’s most capable security model yet, and on June 22, 2026 the company released it in full as the centrepiece of an expanded Daybreak initiative — pairing a vendor-stated 85.6% on the CyberGym benchmark with Patch the Planet, a program that ships AI-discovered fixes to more than 30 open-source projects under Trail of Bits review.

The headline number is real but heavily caveated: OpenAI scored its own model, and the most capable tier is not available to the public. The more durable story sits underneath the benchmarks. OpenAI argues the security bottleneck has inverted — for years the hard part was finding vulnerabilities; now defenders are buried in findings and the hard part is patching them fast enough. Daybreak is built around that inversion.

This guide separates what actually shipped from what is marketing, recomputes the benchmark deltas, maps the three-tier access model so you can see where your organisation sits, and translates the SMB implication that most coverage skips: for small and mid-sized businesses, this capability arrives through your existing security vendor, not a direct OpenAI key.

Key takeaways
  1. 01
    Daybreak expanded on June 22, 2026.OpenAI shipped four additions: the full release of GPT-5.5-Cyber (beyond its earlier permissive-only preview), an updated Codex Security plugin, the Daybreak Cyber Partner Program, and Patch the Planet. The initiative first launched May 12, 2026.
  2. 02
    The bottleneck moved from finding to patching.OpenAI’s framing is that AI now surfaces vulnerabilities faster than teams can remediate them, so the constraint is no longer discovery — it’s landing the fix. That reframing, not the raw benchmark, is the argument worth taking seriously.
  3. 03
    Every benchmark figure here is OpenAI’s own.The 85.6% CyberGym, 39.5% ExploitGym, and 69.8% SEC-bench Pro scores are self-reported and not independently audited. CyberGym is a UC Berkeley benchmark, but OpenAI ran the evaluation. Treat all figures as vendor-stated.
  4. 04
    The most capable models are gated, not public.GPT-5.5-Cyber sits behind Trusted Access for Cyber, and OpenAI’s comparison places Anthropic’s Mythos 5 (also restricted) close behind. The open frontier defenders can actually reach is base GPT-5.5 at a vendor-stated 81.8%.
  5. 05
    SMBs reach Daybreak through their security vendor.Direct model access stays with the 28 named partners — CrowdStrike, Sophos, SentinelOne, Cloudflare, and others. Smaller businesses encounter these capabilities embedded in the products they already run, not via a direct OpenAI API.

01What ShippedA May launch, a June expansion.

Daybreak first launched on May 12, 2026 as an initiative combining GPT-5.5, Codex Security, and Trusted Access for Cyber (TAC) to help organisations find and patch vulnerabilities before attackers exploit them. By May, OpenAI said hundreds of organisations and thousands of individual defenders were already enrolled in the TAC program.

June 22, 2026 was the substantive expansion. OpenAI shipped four things at once: the full release of GPT-5.5-Cyber — moving beyond the initial “permissive-only preview” — an updated Codex Security plugin, the Daybreak Cyber Partner Program with 28 named corporate partners, and Patch the Planet, an open-source patching effort run with Trail of Bits.

Model
GPT-5.5-Cyber
full release · restricted access

OpenAI's most capable security model, now released in full beyond the earlier permissive-only preview. Distributed through continued limited release to trusted defenders — not the general public.

Trusted Access for Cyber
Tooling
Codex Security plugin
updated · GPT-5.5 + TAC

Understands a team's code and threat model, flags plausible vulnerabilities, checks reachability, develops a targeted patch, and verifies the result — with humans deciding what to investigate and apply.

Human-in-control
Ecosystem
Cyber Partner Program
28 named partners

Accenture, Akamai, Cisco, Cloudflare, CrowdStrike, Darktrace, IBM, Okta, Palo Alto Networks, SentinelOne, Sophos, Wiz, Zscaler and more embed GPT-5.5 with Trusted Access for Cyber inside their own security products.

Indirect SMB path
Open source
Patch the Planet
with Trail of Bits

Co-founded with Trail of Bits, alongside HackerOne and Calif, to move open-source maintainers from findings to fixes. Expert human review precedes every finding that reaches a maintainer.

30+ projects committed
Keep the timeline straight
Three dates anchor this story. Codex Security went to research preview in March 2026. The Daybreak initiative launched on May 12, 2026. The GPT-5.5-Cyber full release, partner program, and Patch the Planet are the June 22, 2026 expansion. They are distinct milestones, not a single launch.

02The InversionThe bottleneck moved from finding to patching.

The single most useful idea in the whole announcement is not a number. It is a reframing of where security work actually gets stuck. For most of the last decade, the scarce skill was finding vulnerabilities — fuzzing harnesses, manual review, bug bounties. AI has changed the slope of that curve sharply enough that, in OpenAI’s telling, defenders are now drowning in findings and the real constraint has shifted downstream to remediation.

OpenAI’s reframing
“The bottleneck historically has been finding vulnerabilities, but now defenders are overwhelmed with the number of vulnerabilities found. Instead, the bottleneck is now patching vulnerabilities.” — OpenAI’s Daybreak announcement, June 22, 2026.

That claim is plausible and partly self-serving — a company selling patching tooling has every reason to declare patching the new frontier. But the supporting scale numbers are genuinely large. Since the Codex Security research preview in March 2026, OpenAI says the plugin has scanned more than 30 million commits across over 30,000 codebases. Human reviewers manually marked more than 70,000 findings as fixed, and a further 500,000-plus findings were automatically determined to be fixed. Whatever discount you apply for vendor framing, that is a volume of findings no human triage queue absorbs without help.

The forward-looking question this raises is uncomfortable for defenders. If AI keeps compounding the discovery rate while patch velocity stays human-paced, the gap between “known vulnerable” and “actually fixed” widens rather than closes — and that window is exactly where attackers operate. The bet embedded in Daybreak is that the same models can be pointed at the fix side fast enough to keep that window from blowing open. It is a bet, not a settled result.

"AI is already good and about to get super good at cybersecurity."— Sam Altman, CEO, OpenAI

03BenchmarksThe numbers, and why every one is vendor-stated.

OpenAI reports three benchmark results for GPT-5.5-Cyber, all measured against its own base model. On CyberGym — a UC Berkeley benchmark that tests whether an agent can reproduce 1,507 known software vulnerabilities from 188 open-source projects — the cyber variant posts a vendor-stated 85.6% against 81.8% for base GPT-5.5. On ExploitGym, which tests turning known vulnerabilities into working exploits, it reaches 39.5% versus 25.95%. On SEC-bench Pro, covering long-horizon vulnerability discovery and proof-of-concept generation, it scores 69.8% versus 63.1%.

CyberGym score · single-model, vendor-stated

Source: OpenAI (vendor-stated); Mythos 5 / Opus 4.7 figures per OpenAI's comparison, not independently audited
GPT-5.5-CyberRestricted — Trusted Access for Cyber
85.6%
OpenAI-stated SOTA
Claude Mythos 5Restricted — per OpenAI's comparison
83.8%
GPT-5.5 (baseline)Public — standard OpenAI API
81.8%
Claude Opus 4.7Public — per OpenAI's comparison
73.1%
GPT-5.5-Cyber (gated)Other models

The proprietary table below isolates the apples-to-apples comparison — cyber variant versus base, both scored by OpenAI on the same benchmarks — and recomputes the uplift in percentage points so the gains are not inflated by relative-percentage framing.

GPT-5.5-Cyber versus base GPT-5.5 across CyberGym, ExploitGym, and SEC-bench Pro, with the absolute uplift in percentage points and what each benchmark measures. All figures OpenAI vendor-stated.
BenchmarkGPT-5.5 (base)GPT-5.5-CyberUplift (pp)What it measures
CyberGym81.8%85.6%+3.8Reproducing known vulnerabilities
ExploitGym25.95%39.5%+13.55Turning vulnerabilities into working exploits
SEC-bench Pro63.1%69.8%+6.7Long-horizon discovery and PoC generation
Read the ExploitGym number honestly
The ExploitGym jump from 25.95% to 39.5% is the largest relative gain — roughly 52% higher than base GPT-5.5. But a 39.5% score still means the model fails about 60% of exploit-generation tasks. This is meaningful assistance, not near-autonomous exploitation. OpenAI itself notes the benchmark figures came from its own testing, which it said was continuing on real-world fixes.

04The FieldBeating Mythos 5 — but both models stay gated.

OpenAI is making a public comparison: per its own CyberGym numbers, GPT-5.5-Cyber’s 85.6% edges Anthropic’s Claude Mythos 5 at 83.8%, while base GPT-5.5 (81.8%) and Claude Opus 4.7 (73.1%) sit below. Two cautions matter here. First, these cross-vendor figures are OpenAI-stated, not independently audited — the Mythos 5 and Opus 4.7 scores come from OpenAI’s comparison, not from Anthropic. Second, Mythos is not generally available; it is restricted to a small number of organisations under Anthropic’s rival Project Glasswing.

Multi-benchmark comparison of GPT-5.5-Cyber, Claude Mythos 5, GPT-5.5 baseline, and Claude Opus 4.7 across CyberGym, ExploitGym, and SEC-bench Pro, with public-access status and intended access path. All figures OpenAI vendor-stated; cross-vendor figures per OpenAI’s comparison.
ModelCyberGymExploitGymSEC-bench ProPublic accessAccess path
GPT-5.5-Cyber85.6%*39.5%*69.8%*NoTrusted Access for Cyber — verified defenders only
Claude Mythos 583.8%*N/AN/ANoProject Glasswing — small set of cyber orgs
GPT-5.5 (baseline)81.8%*25.95%*63.1%*Yes (API)Standard OpenAI access
Claude Opus 4.773.1%*N/AN/AYes (API)Standard Anthropic access

* All figures OpenAI-stated; Mythos 5 and Opus 4.7 scores come from OpenAI’s comparison, not independently audited. ExploitGym and SEC-bench Pro results for the Anthropic models have not been published, so those cells are left as N/A rather than estimated.

This produces a genuine paradox. The two most capable security models on the board are both gated — GPT-5.5-Cyber behind Trusted Access for Cyber, Mythos behind Project Glasswing — converging on the same restricted-access philosophy. The frontier defenders and attackers can actually reach today is base GPT-5.5 at 81.8% and Opus 4.7 at 73.1%. Not everyone is convinced the gated tier changes the underlying picture. As SpecterOps CTO Jared Atkinson put it, AI will accelerate offensive security operations, but it does not fundamentally change the underlying problems defenders face. The capability is moving fast; the structural problems of patching, ownership, and coordination are not.

05Access TiersThree tiers of access — who gets what.

OpenAI describes the access model across several pages, but never as a single map. There are three tiers, and the distinction that trips people up is that GPT-5.5-Cyber is not the same thing as GPT-5.5 with Trusted Access for Cyber. The cyber variant is the most permissive, most tightly gated tier; TAC is the middle tier for standard enterprise defensive work.

The three Daybreak access tiers — default GPT-5.5, GPT-5.5 with Trusted Access for Cyber, and GPT-5.5-Cyber — with the gate, who it is for, and how SMBs reach each.
TierGateWho it’s forSMB access path
GPT-5.5 (default)None — standard OpenAI accountAll developers — secure coding, review, triage, patch validationDirect, via the Codex Security plugin
GPT-5.5 + Trusted AccessApplication + identity verification; phishing-resistant auth required from June 1, 2026Cyber teams, security vendors, integrators, DevSecOpsIndirect — through partner vendor products
GPT-5.5-CyberStricter verification, scoping, logging, ongoing reviewAuthorized red teams and penetration testersNot typically available — enterprise/government path only
A gate with a deadline
Trusted Access for Cyber members must enable Advanced Account Security from June 1, 2026, or attest to phishing-resistant single sign-on. OpenAI’s stated guidance: for most defenders, GPT-5.5 with Trusted Access for Cyber and Codex Security remains the right starting point — the cyber variant is not the default recommendation.

OpenAI’s framing of why it gates at all is worth quoting in its own words: the company says it does not think it is practical or appropriate to centrally decide who gets to defend themselves. The tiering is the attempt to square broad defensive access with limiting the most offense-capable behaviour to verified, scoped, logged users.

06Patch the PlanetFrom findings to fixes, with humans in front.

Patch the Planet is the part of the announcement with the most concrete, checkable detail — and the design choice that distinguishes it. Co-founded with Trail of Bits, in collaboration with HackerOne and Calif, the program helps open-source maintainers move from findings to fixes. More than 30 projects have committed, including cURL, Go, Python, Sigstore, NATS Server, aiohttp, and python.org. The deliberate design principle: expert human review precedes every finding that reaches a maintainer. Trail of Bits engineers manually deduplicate, correct severity, and filter false positives before anything is submitted — the opposite of a raw AI bug-dump that floods maintainers faster than they can respond.

That matters because the people on the receiving end are stretched thin. OpenAI cites Harvard and Linux Foundation research finding that 94% of widely used open-source projects studied had fewer than ten developers responsible for more than 90% of the code added in a year. An AI that surfaces hundreds of issues into a one-maintainer project is a denial-of-service on attention unless something filters first.

Linux kernel
LPE exploits + 8 leak PoCs
24

GPT-5.5-Cyber analyzed 30M+ lines of kernel code, generating 24 local privilege-escalation exploits and 8 pointer information-leak proof-of-concepts from hundreds of flagged potential issues.

vendor-stated
FreeBSD
vulnerabilities confirmed
34

OpenAI researchers confirmed 34 FreeBSD vulnerabilities and produced 7 local privilege-escalation PoCs, with CVE disclosures documented on freebsd.org.

7 LPE PoCs
Chrome V8
exploitable bugs reported
5

Five exploitable vulnerabilities found in Chrome's V8 JavaScript engine; three were identified and remediated within days of being introduced into the codebase.

3 fixed in days
Safari WebKit
bugs in roughly a week
10+

More than ten exploitable WebKit vulnerabilities found and reported during roughly one week of focused work — a pace that is the whole point of the patching-bottleneck argument.

vendor-stated

The single most vivid example does not require parsing a benchmark. During its own safety evaluations, GPT-5.5 — the base model, not even the cyber variant — identified a WebAssembly vulnerability in Firefox, recorded as CVE-2026-8390. Mozilla patched it two days before Pwn2Own Berlin. Five of the six registered Firefox competition entries withdrew, and no Firefox exploit was successfully demonstrated at the event. A model found a real, exploitable browser bug during routine testing and quietly took an entire competition track off the board.

Other findings round out the picture. AI models identified a 23-year-old use-after-free in OpenBSD’s kernel implementation of System V semaphores, confirmed to let an unprivileged local user escalate to root. Calif used Codex to discover an HTTP/2 denial-of-service technique affecting major server software including NGINX, Apache, IIS, and Pingora; its analysis estimated more than 880,000 internet-facing websites were running affected software with HTTP/2 enabled. And Codex Security independently identified vulnerable patterns corresponding to four of the six dnsmasq CVEs fixed in release 2.92rel2. Trail of Bits also reported building a complete fuzzing lab in less than a day using GPT-5.5-Cyber — work it estimates would ordinarily take at least several weeks by hand.

Assistive, not autonomous
OpenAI is explicit that this is validated remediation with a human in control: the system identifies plausible vulnerabilities, checks reachability, gathers evidence, develops a targeted patch, and verifies it — but humans remain in control of which findings to investigate, which changes to apply, and what information to share. Patch the Planet is an open-source initiative run with Trail of Bits, not a self-serve enterprise patching service.

07ImplicationsWhat it means for SMBs and agencies.

The practical takeaway most coverage skips: for small and mid-sized businesses, Daybreak is not something you buy directly. The Cyber Partner Program is explicitly architectural — OpenAI routes capabilities through 28 partners who embed them inside their own products, keeping direct model access in the hands of those partners. If you run CrowdStrike, Sophos, SentinelOne, Cloudflare, or one of the other named vendors, you will encounter this AI as a feature of tools you already pay for, not as an API key you provision. That is the same build-vs-buy decision for AI-assisted workflows playing out in security: the realistic path for most teams is buy, through the stack they already operate.

Most developers
Start with the default tier

OpenAI's own guidance is that GPT-5.5 with Trusted Access for Cyber and Codex Security is the right starting point for most defenders. The base tier needs no special gate — secure coding, review, and patch validation are available now.

Use GPT-5.5 default
SMBs
Access arrives through your security vendor

Direct model access stays with the 28 partners. If you use CrowdStrike, Sophos, or SentinelOne, the capability reaches you embedded in those products — not via a direct OpenAI key. Ask your vendor what they have integrated.

Buy through your stack
Security teams
Apply for Trusted Access for Cyber

Advanced defensive work — triage, malware analysis, detection engineering, incident response — needs the middle tier, which requires application, identity verification, and phishing-resistant auth from June 1, 2026.

Apply for TAC
Red teams
GPT-5.5-Cyber is gated tightest

The cyber variant is reserved for authorized red teams and penetration testers under stricter verification, scoping, and logging. There is no self-serve path; this is an enterprise and government channel.

Enterprise/gov only

For agencies and engineering teams, the strategic read is that defensive tooling is becoming a model-routing question rather than a single-vendor choice. The same discipline we bring to agentic security risks applies here: decide which workloads justify gated access, which are well served by base GPT-5.5, and which belong inside a partner product you already trust. Pair that with hands-on hygiene — the kind of account security audit practices that close the gaps no model patches for you. If you are weighing where AI-assisted security fits in your own builds, our secure web development engagements and AI digital transformation programs start with exactly this kind of routing and governance decision. The named partners — including Accenture’s AI security partnerships — show how the integration layer is already forming.

08ConclusionThe capability is here; the distribution is the question.

The shape of defensive AI, June 2026

Finding bugs got cheap. Landing the fix is the new frontier.

GPT-5.5-Cyber and the expanded Daybreak initiative are a real step in AI-assisted security — but the durable insight is the reframing, not the leaderboard. When models surface vulnerabilities faster than teams can remediate them, the constraint moves to patching, and Patch the Planet is OpenAI’s attempt to put a human-reviewed pipeline around that shift.

Hold the benchmarks at arm’s length. The 85.6% CyberGym, 39.5% ExploitGym, and 69.8% SEC-bench Pro figures are OpenAI’s own, unaudited, and the cross-vendor comparisons against Mythos 5 and Opus 4.7 come from OpenAI rather than Anthropic. Even taken at face value, a 39.5% exploit-generation score means the model fails most of those tasks. This is a force multiplier for defenders, in Cisco’s framing — not an autonomous patching machine.

The forward signal is about distribution, not capability. The two strongest security models on the board are both gated, the open frontier defenders can reach is base GPT-5.5, and for most businesses the capability arrives indirectly through a security vendor. The winning move is not chasing the gated tier — it is deciding, workload by workload, where AI-assisted security belongs in a stack you already run, and making sure the patch actually ships.

Put AI-assisted security to work

The hard part is no longer finding the bug — it’s shipping the fix.

Our team helps businesses route AI security capabilities across gated and open models, evaluate partner-embedded tooling, and build the governance that turns findings into shipped fixes — delivered in days, not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

AI security routing engagements

  • Model routing — gated vs base GPT-5.5 by workload
  • Partner-embedded tooling evaluation for SMBs
  • Patch-velocity and remediation pipeline design
  • Security governance for AI-assisted workflows
  • Vulnerability triage and prioritisation programs
FAQ · GPT-5.5-Cyber & Daybreak

The questions teams ask about Daybreak.

GPT-5.5-Cyber is OpenAI's most capable security-specialised model, released in full on June 22, 2026 as part of the expanded Daybreak initiative. It is tuned for advanced security work and posts higher vendor-stated scores than base GPT-5.5 on three benchmarks: 85.6% versus 81.8% on CyberGym, 39.5% versus 25.95% on ExploitGym, and 69.8% versus 63.1% on SEC-bench Pro. The crucial difference is access, not just capability. Base GPT-5.5 is available through the standard OpenAI API to any developer, while GPT-5.5-Cyber is restricted to authorized red teams and penetration testers under stricter verification, scoping, and logging. It is also distinct from GPT-5.5 with Trusted Access for Cyber, which is a separate middle tier for standard enterprise defensive work.
Related dispatches

Continue exploring AI & security.