OpenAI began a limited preview of GPT-5.6 on June 26, 2026, and the packaging is as much the story as the capability jump. Instead of a single new model, GPT-5.6 is a family of three durable tiers: Sol, the flagship for ambitious agentic work; Terra, a balanced model for efficient everyday work; and Luna, a fast, affordable model for high-volume jobs.
The new naming convention is deliberate. In OpenAI’s framing, the number identifies the generation while Sol, Terra, and Luna identify capability tiers that can advance on their own cadence — a permanent good, better, best ladder rather than model names you relearn every few weeks. Alongside the tiers, GPT-5.6 introduces a new max reasoning effort, a multi-agent ultra mode, and more predictable prompt caching.
This guide breaks down each of the three models, the pricing and caching changes, the benchmarks read honestly (these are OpenAI’s own preview figures, not independently audited), the heavier safety stack that ships with the release, and the unusual government-coordinated rollout that keeps GPT-5.6 on the API and Codex for now rather than in ChatGPT.
- 01GPT-5.6 is three models, not one.Sol (flagship), Terra (balanced), and Luna (high-volume) ship together as durable capability tiers. The number is the generation; the names are the tiers, each free to advance on its own schedule.
- 02Sol holds GPT-5.5’s price; Terra roughly halves it.Sol is $5 input / $30 output per 1M tokens — identical to GPT-5.5, so it is effectively a free capability upgrade. Terra delivers GPT-5.5-class performance at $2.50 / $15, about half the cost. Luna is $1 / $6.
- 03New max effort and ultra multi-agent mode.max is a new top reasoning-effort setting that gives Sol the most time to reason. ultra goes beyond a single agent, coordinating subagents to accelerate complex, long-horizon work — and it posts the top coding score.
- 04The benchmarks are strong but OpenAI-reported.Sol in ultra mode reaches a reported 91.9% on Terminal-Bench 2.1, with standard Sol around 88.8% edging Claude Mythos 5 at ~88%. A recurring theme is efficiency: matching rivals while spending far fewer tokens.
- 05It is a gated preview, not a public launch.GPT-5.6 is available only via the API and Codex to a small set of trusted partners, coordinated with the U.S. government under a new frontier-model framework. General availability across ChatGPT, Codex, and the API is expected in the coming weeks.
01 — What ShippedOne generation, three durable tiers.
GPT-5.6 splits a single release into a tiered family. OpenAI describes Sol as its strongest model yet, Terra as competitive with GPT-5.5 at roughly half the cost, and Luna as strong capability at its lowest price. The shift matters because it turns model choice into a deliberate design decision: route heavy reasoning to Sol, steady production work to Terra, and high-volume tasks to Luna.
The naming change is the durable part. Previously each release was a single point on a line; now the generation number and the capability tier are separated. Sol, Terra, and Luna are meant to persist and improve independently, which means a routing decision you make today — send this workload to Terra, that one to Luna — should stay valid as each tier advances. For the GPT-5.5 baseline these tiers are measured against, our complete GPT-5.5 guide covers the model the whole family is priced and benchmarked against.
02 — The Three TiersWhat each model is for.
The tiers are positioned by job, not just by size. Sol is the model you reach for when the task is long-horizon and the answer has to be right; Terra is the everyday workhorse where GPT-5.5-class quality at half the price changes the math; Luna is for the high-throughput jobs you run thousands of times and care about unit economics more than peak reasoning.
The flagship
OpenAI’s strongest model yet, tuned for long-horizon agentic work: multi-step coding, deep research, and vulnerability analysis. It takes the headline benchmarks and pairs with the heaviest safeguards. Same price as GPT-5.5.
The balanced tier
Pitched at the broad middle of production work. OpenAI says it is competitive with GPT-5.5 while being 2x cheaper — the obvious migration target if you run GPT-5.5 today purely for its quality.
The volume tier
The fast, low-cost option for jobs run at scale — classification, extraction, routing, first-pass drafting. Strong capability at the lowest cost, and even Luna gains on cyber-reasoning tasks as you raise its reasoning budget.
03 — Pricing & CachingThe numbers that move a decision.
Two things stand out in the price list. Sol holds the line at GPT-5.5’s exact rate ($5 input / $30 output per 1M tokens), so for existing GPT-5.5 workloads it is effectively a free capability upgrade. Terra is the value play — GPT-5.5-class quality at $2.50 / $15, roughly half the cost — and Luna sits at the floor for high-throughput pipelines. The cached column below is the 90%-discounted cache-read rate.
| Model | Tier | Input ($/1M) | Cached ($/1M) | Output ($/1M) |
|---|---|---|---|---|
| GPT-5.6 Sol | Flagship · ambitious agentic work | $5.00 | $0.50 | $30.00 |
| GPT-5.6 Terra | Balanced · efficient everyday work | $2.50 | $0.25 | $15.00 |
| GPT-5.6 Luna | Fast · high-volume work | $1.00 | $0.10 | $6.00 |
GPT-5.6 also reworks prompt caching, which matters more than it sounds for agentic loops that re-send a large stable context — a codebase, a long system prompt, a tool schema — on every step. You now get explicit cache breakpoints and a 30-minute minimum cache life, cache reads keep the steep 90% discount, and cache writes are billed at 1.25x the uncached input rate. For a deeper look at how price and capability trade off across the current field, our performance-versus-price analysis frames where each tier lands.
Cached-input discount
Cache reads keep the 90% discount on input — the basis for the cached rates in the table above ($0.50 on Sol, $0.25 on Terra, $0.10 on Luna). Repeated context gets cheap to re-send.
Write premium
For GPT-5.6 and later, the first write into the cache is billed at 1.25x the uncached input rate. You pay a small premium once to make every subsequent read 90% cheaper.
Minimum retention
A guaranteed 30-minute minimum cache life plus explicit breakpoints make the cost of a long agentic session far more predictable than implicit, short-lived caching.
04 — What's Newmax effort and an ultra multi-agent mode.
Beyond the tiers, GPT-5.6 introduces two new ways to spend compute at inference time. A new max reasoning effort sits above the existing levels and gives Sol the most time to reason through hard problems. The more interesting addition is ultra, which goes beyond a single agent by coordinating subagents to divide and accelerate complex, long-horizon work — and it is the configuration that posts the top coding score in OpenAI’s charts.
That second feature is the clearest signal of where the frontier is heading: not just larger single models, but models that natively spin up and manage their own helpers. We have covered the same pattern arriving in OpenAI’s coding tools in our guide to Codex subagents and multi-agent autonomous coding, and ultra brings it into the core model API.
Deeper single-agent reasoning
A new reasoning-effort setting above the existing tiers, giving Sol the most time to reason deeply on a single hard problem. Reach for it on the tasks where correctness matters more than latency.
Subagent orchestration
ultra coordinates subagents to split and accelerate long-horizon work — the orchestration pattern teams used to hand-build, now baked into the model. It drives the headline Terminal-Bench 2.1 result.
05 — BenchmarksThe numbers, read honestly.
OpenAI shared a focused set of preview evaluations, with a fuller suite promised at general availability. The coding headline is Terminal-Bench 2.1, which tests command-line workflows requiring planning, iteration, and tool coordination: Sol in ultra mode reaches a reported 91.9%, with standard Sol around 88.8%, edging Anthropic’s Claude Mythos 5 at about 88%. One honest caveat governs every figure here — these are OpenAI’s own reported results, not independently audited, so read them as vendor claims on a common axis.
A version caveat matters too: GPT-5.5’s widely cited 82.7% was measured on Terminal-Bench 2.0, a different benchmark version, so it should not be placed on the same axis as these 2.1 scores. We chart only the 2.1 figures OpenAI published for this release below.
Terminal-Bench 2.1 · OpenAI-reported coding scores
Source: OpenAI GPT-5.6 preview evaluations (Terminal-Bench 2.1) — vendor-reported, not independently audited. OpenAI says a fuller cross-model suite will follow at general availability.The pattern across the rest of OpenAI’s preview evaluations is efficiency, not just higher peaks. On Agent’s Last Exam, which spans 55 professional domains of long-running work, OpenAI reports Sol as the only model to clear 50% (about 50.9% in code mode) while using fewer tokens than prior architectures. On GeneBench v1, a long-horizon genomics benchmark, Sol beats GPT-5.5 while consuming fewer tokens. And on the security side, OpenAI says Sol is competitive with a Mythos preview on ExploitBench using roughly one-third the output tokens.
06 — Cyber & SafetyStronger cyber capability, stronger safeguards.
The same strength that makes Sol useful to defenders makes it sensitive, so OpenAI wrapped the release in what it calls its most robust safety stack to date. The design is explicitly layered, with configurations matched to each model’s capabilities, and OpenAI’s own framing is that no single safeguard is sufficient against determined misuse.
Importantly, OpenAI states Sol does not cross the “Cyber Critical” threshold in its Preparedness Framework. In tests against the Chromium and Firefox codebases it found bugs and exploitation primitives but did not autonomously assemble a working full-chain exploit — and the company frames the model as better at helping people find and fix vulnerabilities than at carrying out end-to-end attacks.
Model-level refusals
Trained to refuse prohibited cyber assistance, including when users disguise intent or attempt jailbreaks — the first boundary around what the model will and will not help with.
Real-time classifiers
Classifiers evaluate output as it is generated. On higher-risk cases, generation can pause while a larger reasoning model reviews the full context and withholds disallowed output before it reaches the user.
Account-level review
Flagged activity can trigger review across a user’s conversations and risk signals, helping separate persistent malicious behavior from legitimate dual-use security work where the same concepts appear.
Automated red-teaming
OpenAI dedicated over 700,000 A100-equivalent GPU hours to automated red-teaming aimed at universal jailbreaks, plus weeks of third-party human red-teaming, to harden the stack against adaptive attacks.
07 — The Gated RolloutWhy you can’t use it yet.
GPT-5.6 breaks from the usual launch playbook. During the preview it is available only through the API and Codex, to a small group of vetted partners — reported to be around 20 organizations — whose identities were shared with the U.S. government. According to Reuters, the backdrop is an executive order signed earlier this month that sets up a voluntary framework for developers to offer “covered frontier models” to the government for up to 30 days before releasing them to trusted partners.
OpenAI complied but pushed back publicly. In its own announcement, the company framed the gated preview as a short-term step toward broad availability “in the coming weeks,” while objecting to the process becoming permanent. For teams, the practical lesson echoes the export-control disruptions earlier this year — concentration risk is real, and our second-source resilience playbook covers how to avoid being stranded when access to a single model is gated overnight.
“We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them.”— OpenAI, Previewing GPT-5.6 Sol, June 26, 2026
There is an infrastructure note worth flagging alongside the access story: OpenAI plans to launch Sol on Cerebras hardware in July at up to 750 tokens per second, initially for select customers — a bet on frontier-grade reasoning at very low latency. We have tracked the same Cerebras-on-OpenAI pattern before in our look at real-time coding on Cerebras, and GPT-5.6 extends it to the new flagship.
08 — What It MeansHow a team should route across the tiers.
The tiered family turns model selection into an architecture decision rather than a default. The practical move is to match each workload to the cheapest tier that clears its quality bar, and — because the tier names are durable — to encode that routing once and let it hold as each tier improves. The matrix below is where we start that conversation with clients.
Hard, long-horizon work
Multi-step coding, deep research, security analysis — anything where correctness justifies the flagship price and the new max or ultra modes earn their keep. Same per-token cost as GPT-5.5, more capability.
Today’s GPT-5.5 workloads
If you run GPT-5.5 mainly for its quality, model Terra first: OpenAI positions it as competitive performance at roughly half the cost. Validate on your own evals, then move the steady-state production traffic.
High-volume, lower-stakes jobs
Classification, extraction, routing, first-pass drafting run thousands of times a day belong on the cheapest tier. Reserve Luna for work where throughput and unit cost matter more than peak reasoning.
Access and concentration risk
GPT-5.6 is API/Codex-only behind a limited preview, with dates that are soft. Prototype now, but keep a second-source fallback so a gated or delayed model can’t stall a production workflow.
The honest sequencing is the same one we use on every model bump: pin the workloads to evals you control, migrate the cost-sensitive traffic to the cheapest tier that passes, and keep a fallback for anything that depends on a single gated model. That scoping — which workloads, which tier, which guardrails — is exactly where our agentic AI transformation engagements begin, before any model commitment.
09 — ConclusionA range, not a single model.
OpenAI is productising a range and pushing multi-agent reasoning into the mainstream.
GPT-5.6 is best read as OpenAI shipping a family rather than a model. Sol holds GPT-5.5’s price while raising the ceiling, Terra resets the cost math on everyday work, and Luna anchors the high-volume floor — and the new max and ultra modes make native multi-agent reasoning a default expectation, not a hand-built add-on.
Keep the framing precise. The benchmarks are strong but OpenAI-reported, not audited; the headline efficiency gains are as much about tokens-per-result as raw scores; and the whole thing ships behind a government-coordinated preview that even OpenAI says should not become the norm. Dates are soft, access is gated, and the safety stack will occasionally get in the way of legitimate work.
The durable takeaway is the tiered family itself. When intelligence, speed, and cost become an explicit three-way choice you encode once, the teams that win are the ones that route deliberately — flagship where it matters, cheapest tier where it does not, and a second source behind anything that can be gated. That, more than any single benchmark point, is what this release changes.