OpenAI began a limited preview of GPT-5.6 on June 26, 2026, and the packaging is as much the story as the capability jump. Instead of a single new model, GPT-5.6 is a family of three durable tiers: Sol, the flagship for ambitious agentic work; Terra, a balanced model for efficient everyday work; and Luna, a fast, affordable model for high-volume jobs.

The new naming convention is deliberate. In OpenAI’s framing, the number identifies the generation while Sol, Terra, and Luna identify capability tiers that can advance on their own cadence — a permanent good, better, best ladder rather than model names you relearn every few weeks. Alongside the tiers, GPT-5.6 introduces a new max reasoning effort, a multi-agent ultra mode, and more predictable prompt caching.

This guide breaks down each of the three models, the pricing and caching changes, the benchmarks read honestly (these are OpenAI’s own preview figures, not independently audited), the heavier safety stack that ships with the release, and the unusual government-coordinated rollout that keeps GPT-5.6 on the API and Codex for now rather than in ChatGPT.

Key takeaways

01
GPT-5.6 is three models, not one.Sol (flagship), Terra (balanced), and Luna (high-volume) ship together as durable capability tiers. The number is the generation; the names are the tiers, each free to advance on its own schedule.
02
Sol holds GPT-5.5’s price; Terra roughly halves it.Sol is $5 input / $30 output per 1M tokens — identical to GPT-5.5, so it is effectively a free capability upgrade. Terra delivers GPT-5.5-class performance at $2.50 / $15, about half the cost. Luna is $1 / $6.
03
New max effort and ultra multi-agent mode.max is a new top reasoning-effort setting that gives Sol the most time to reason. ultra goes beyond a single agent, coordinating subagents to accelerate complex, long-horizon work — and it posts the top coding score.
04
The benchmarks are strong but OpenAI-reported.Sol in ultra mode reaches a reported 91.9% on Terminal-Bench 2.1, with standard Sol around 88.8% edging Claude Mythos 5 at ~88%. A recurring theme is efficiency: matching rivals while spending far fewer tokens.
05
It is a gated preview, not a public launch.GPT-5.6 is available only via the API and Codex to a small set of trusted partners, coordinated with the U.S. government under a new frontier-model framework. General availability across ChatGPT, Codex, and the API is expected in the coming weeks.

01 — What ShippedOne generation, three durable tiers.

GPT-5.6 splits a single release into a tiered family. OpenAI describes Sol as its strongest model yet, Terra as competitive with GPT-5.5 at roughly half the cost, and Luna as strong capability at its lowest price. The shift matters because it turns model choice into a deliberate design decision: route heavy reasoning to Sol, steady production work to Terra, and high-volume tasks to Luna.

The naming change is the durable part. Previously each release was a single point on a line; now the generation number and the capability tier are separated. Sol, Terra, and Luna are meant to persist and improve independently, which means a routing decision you make today — send this workload to Terra, that one to Luna — should stay valid as each tier advances. For the GPT-5.5 baseline these tiers are measured against, our complete GPT-5.5 guide covers the model the whole family is priced and benchmarked against.

OpenAI, in its own words

OpenAI frames the release plainly: Sol is “our flagship model,” Terra “has competitive performance to GPT-5.5 while being 2x cheaper,” and Luna “brings strong capability at our lowest cost.” The company adds that it plans to make all three “generally available in the coming weeks” after the preview period — so treat current access, pricing, and model IDs as provisional until that wider launch.

02 — The Three TiersWhat each model is for.

The tiers are positioned by job, not just by size. Sol is the model you reach for when the task is long-horizon and the answer has to be right; Terra is the everyday workhorse where GPT-5.5-class quality at half the price changes the math; Luna is for the high-throughput jobs you run thousands of times and care about unit economics more than peak reasoning.

Sol

The flagship

$5 / $30 per 1M

OpenAI’s strongest model yet, tuned for long-horizon agentic work: multi-step coding, deep research, and vulnerability analysis. It takes the headline benchmarks and pairs with the heaviest safeguards. Same price as GPT-5.5.

Ambitious agentic work

Terra

The balanced tier

$2.50 / $15 per 1M

Pitched at the broad middle of production work. OpenAI says it is competitive with GPT-5.5 while being 2x cheaper — the obvious migration target if you run GPT-5.5 today purely for its quality.

Efficient everyday work

Luna

The volume tier

$1 / $6 per 1M

The fast, low-cost option for jobs run at scale — classification, extraction, routing, first-pass drafting. Strong capability at the lowest cost, and even Luna gains on cyber-reasoning tasks as you raise its reasoning budget.

High-volume work

03 — Pricing & CachingThe numbers that move a decision.

Two things stand out in the price list. Sol holds the line at GPT-5.5’s exact rate ($5 input / $30 output per 1M tokens), so for existing GPT-5.5 workloads it is effectively a free capability upgrade. Terra is the value play — GPT-5.5-class quality at $2.50 / $15, roughly half the cost — and Luna sits at the floor for high-throughput pipelines. The cached column below is the 90%-discounted cache-read rate.

GPT-5.6 per-million-token pricing across the three tiers — Sol, Terra, and Luna — with input, cached-input (cache-read), and output rates. Figures are from OpenAI’s GPT-5.6 preview announcement and pricing card dated June 26, 2026. Sol matches GPT-5.5’s $5 input and $30 output.
Model	Tier	Input ($/1M)	Cached ($/1M)	Output ($/1M)
GPT-5.6 Sol	Flagship · ambitious agentic work	$5.00	$0.50	$30.00
GPT-5.6 Terra	Balanced · efficient everyday work	$2.50	$0.25	$15.00
GPT-5.6 Luna	Fast · high-volume work	$1.00	$0.10	$6.00

GPT-5.6 also reworks prompt caching, which matters more than it sounds for agentic loops that re-send a large stable context — a codebase, a long system prompt, a tool schema — on every step. You now get explicit cache breakpoints and a 30-minute minimum cache life, cache reads keep the steep 90% discount, and cache writes are billed at 1.25x the uncached input rate. For a deeper look at how price and capability trade off across the current field, our performance-versus-price analysis frames where each tier lands.

Cache reads

Cached-input discount

90%off

Cache reads keep the 90% discount on input — the basis for the cached rates in the table above ($0.50 on Sol, $0.25 on Terra, $0.10 on Luna). Repeated context gets cheap to re-send.

Repeat-context savings

Cache writes

Write premium

1.25×

For GPT-5.6 and later, the first write into the cache is billed at 1.25x the uncached input rate. You pay a small premium once to make every subsequent read 90% cheaper.

Pay once, read cheap

Cache life

Minimum retention

30min

A guaranteed 30-minute minimum cache life plus explicit breakpoints make the cost of a long agentic session far more predictable than implicit, short-lived caching.

Predictable spend

04 — What's New`max` effort and an ultra multi-agent mode.

Beyond the tiers, GPT-5.6 introduces two new ways to spend compute at inference time. A new max reasoning effort sits above the existing levels and gives Sol the most time to reason through hard problems. The more interesting addition is ultra, which goes beyond a single agent by coordinating subagents to divide and accelerate complex, long-horizon work — and it is the configuration that posts the top coding score in OpenAI’s charts.

That second feature is the clearest signal of where the frontier is heading: not just larger single models, but models that natively spin up and manage their own helpers. We have covered the same pattern arriving in OpenAI’s coding tools in our guide to Codex subagents and multi-agent autonomous coding, and ultra brings it into the core model API.

max

Deeper single-agent reasoning

new top effort level

A new reasoning-effort setting above the existing tiers, giving Sol the most time to reason deeply on a single hard problem. Reach for it on the tasks where correctness matters more than latency.

One agent, more thinking

ultra

Subagent orchestration

beyond a single agent

ultra coordinates subagents to split and accelerate long-horizon work — the orchestration pattern teams used to hand-build, now baked into the model. It drives the headline Terminal-Bench 2.1 result.

Multi-agent, in the model

05 — BenchmarksThe numbers, read honestly.

OpenAI shared a focused set of preview evaluations, with a fuller suite promised at general availability. The coding headline is Terminal-Bench 2.1, which tests command-line workflows requiring planning, iteration, and tool coordination: Sol in ultra mode reaches a reported 91.9%, with standard Sol around 88.8%, edging Anthropic’s Claude Mythos 5 at about 88%. One honest caveat governs every figure here — these are OpenAI’s own reported results, not independently audited, so read them as vendor claims on a common axis.

A version caveat matters too: GPT-5.5’s widely cited 82.7% was measured on Terminal-Bench 2.0, a different benchmark version, so it should not be placed on the same axis as these 2.1 scores. We chart only the 2.1 figures OpenAI published for this release below.

Terminal-Bench 2.1 · OpenAI-reported coding scores

Source: OpenAI GPT-5.6 preview evaluations (Terminal-Bench 2.1) — vendor-reported, not independently audited. OpenAI says a fuller cross-model suite will follow at general availability.

GPT-5.6 Sol · ultraOpenAI-reported · new state of the art

91.9

GPT-5.6 SolOpenAI-reported · standard flagship

88.8

Claude Mythos 5Self-reported · closest charted rival

88.0

GPT-5.6 Sol (OpenAI)Competing flagship (self-reported)

The pattern across the rest of OpenAI’s preview evaluations is efficiency, not just higher peaks. On Agent’s Last Exam, which spans 55 professional domains of long-running work, OpenAI reports Sol as the only model to clear 50% (about 50.9% in code mode) while using fewer tokens than prior architectures. On GeneBench v1, a long-horizon genomics benchmark, Sol beats GPT-5.5 while consuming fewer tokens. And on the security side, OpenAI says Sol is competitive with a Mythos preview on ExploitBench using roughly one-third the output tokens.

The real story is tokens per result

More than any single score, GPT-5.6’s gains are about capability per token. Matching or beating rivals while spending a fraction of the tokens is a cost story dressed as a benchmark story — and for production agents that run a workflow thousands of times a month, tokens-per-result is the number that actually shows up on the invoice.

06 — Cyber & SafetyStronger cyber capability, stronger safeguards.

The same strength that makes Sol useful to defenders makes it sensitive, so OpenAI wrapped the release in what it calls its most robust safety stack to date. The design is explicitly layered, with configurations matched to each model’s capabilities, and OpenAI’s own framing is that no single safeguard is sufficient against determined misuse.

Importantly, OpenAI states Sol does not cross the “Cyber Critical” threshold in its Preparedness Framework. In tests against the Chromium and Firefox codebases it found bugs and exploitation primitives but did not autonomously assemble a working full-chain exploit — and the company frames the model as better at helping people find and fix vulnerabilities than at carrying out end-to-end attacks.

Layer 1

Model-level refusals

trained into the weights

Trained to refuse prohibited cyber assistance, including when users disguise intent or attempt jailbreaks — the first boundary around what the model will and will not help with.

First boundary

Layer 2

Real-time classifiers

cyber + bio, during output

Classifiers evaluate output as it is generated. On higher-risk cases, generation can pause while a larger reasoning model reviews the full context and withholds disallowed output before it reaches the user.

Intervene mid-stream

Layer 3

Account-level review

across conversations

Flagged activity can trigger review across a user’s conversations and risk signals, helping separate persistent malicious behavior from legitimate dual-use security work where the same concepts appear.

Pattern over time

Layer 4

Automated red-teaming

700K+ A100-equiv GPU hours

OpenAI dedicated over 700,000 A100-equivalent GPU hours to automated red-teaming aimed at universal jailbreaks, plus weeks of third-party human red-teaming, to harden the stack against adaptive attacks.

Hardening at scale

A caveat to plan around during the preview

Because legitimate defensive work uses the same building blocks as offensive work, OpenAI warns the classifiers may produce false positives: during the preview, expect occasional refusals, paused generations, or added latency on dual-use security tasks. Reducing that friction is part of what the preview is explicitly designed to test, so treat it as expected behaviour rather than a defect.

07 — The Gated RolloutWhy you can’t use it yet.

GPT-5.6 breaks from the usual launch playbook. During the preview it is available only through the API and Codex, to a small group of vetted partners — reported to be around 20 organizations — whose identities were shared with the U.S. government. According to Reuters, the backdrop is an executive order signed earlier this month that sets up a voluntary framework for developers to offer “covered frontier models” to the government for up to 30 days before releasing them to trusted partners.

OpenAI complied but pushed back publicly. In its own announcement, the company framed the gated preview as a short-term step toward broad availability “in the coming weeks,” while objecting to the process becoming permanent. For teams, the practical lesson echoes the export-control disruptions earlier this year — concentration risk is real, and our second-source resilience playbook covers how to avoid being stranded when access to a single model is gated overnight.

“We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them.”— OpenAI, Previewing GPT-5.6 Sol, June 26, 2026

There is an infrastructure note worth flagging alongside the access story: OpenAI plans to launch Sol on Cerebras hardware in July at up to 750 tokens per second, initially for select customers — a bet on frontier-grade reasoning at very low latency. We have tracked the same Cerebras-on-OpenAI pattern before in our look at real-time coding on Cerebras, and GPT-5.6 extends it to the new flagship.

08 — What It MeansHow a team should route across the tiers.

The tiered family turns model selection into an architecture decision rather than a default. The practical move is to match each workload to the cheapest tier that clears its quality bar, and — because the tier names are durable — to encode that routing once and let it hold as each tier improves. The matrix below is where we start that conversation with clients.

Route to Sol

Hard, long-horizon work

Multi-step coding, deep research, security analysis — anything where correctness justifies the flagship price and the new max or ultra modes earn their keep. Same per-token cost as GPT-5.5, more capability.

Reasoning-critical tasks

Migrate to Terra

Today’s GPT-5.5 workloads

If you run GPT-5.5 mainly for its quality, model Terra first: OpenAI positions it as competitive performance at roughly half the cost. Validate on your own evals, then move the steady-state production traffic.

Cost-down, quality-flat

Scale on Luna

High-volume, lower-stakes jobs

Classification, extraction, routing, first-pass drafting run thousands of times a day belong on the cheapest tier. Reserve Luna for work where throughput and unit cost matter more than peak reasoning.

Throughput economics

Plan around the gate

Access and concentration risk

GPT-5.6 is API/Codex-only behind a limited preview, with dates that are soft. Prototype now, but keep a second-source fallback so a gated or delayed model can’t stall a production workflow.

Don’t single-source

The honest sequencing is the same one we use on every model bump: pin the workloads to evals you control, migrate the cost-sensitive traffic to the cheapest tier that passes, and keep a fallback for anything that depends on a single gated model. That scoping — which workloads, which tier, which guardrails — is exactly where our agentic AI transformation engagements begin, before any model commitment.

09 — ConclusionA range, not a single model.

The shape of GPT-5.6, June 2026

OpenAI is productising a range and pushing multi-agent reasoning into the mainstream.

GPT-5.6 is best read as OpenAI shipping a family rather than a model. Sol holds GPT-5.5’s price while raising the ceiling, Terra resets the cost math on everyday work, and Luna anchors the high-volume floor — and the new max and ultra modes make native multi-agent reasoning a default expectation, not a hand-built add-on.

Keep the framing precise. The benchmarks are strong but OpenAI-reported, not audited; the headline efficiency gains are as much about tokens-per-result as raw scores; and the whole thing ships behind a government-coordinated preview that even OpenAI says should not become the norm. Dates are soft, access is gated, and the safety stack will occasionally get in the way of legitimate work.

The durable takeaway is the tiered family itself. When intelligence, speed, and cost become an explicit three-way choice you encode once, the teams that win are the ones that route deliberately — flagship where it matters, cheapest tier where it does not, and a second source behind anything that can be gated. That, more than any single benchmark point, is what this release changes.

GPT-5.6: Sol, Terra, and Luna

01 — What ShippedOne generation, three durable tiers.

02 — The Three TiersWhat each model is for.

The flagship

The balanced tier

The volume tier

03 — Pricing & CachingThe numbers that move a decision.

Cached-input discount

Write premium

Minimum retention

04 — What's New`max` effort and an ultra multi-agent mode.

Deeper single-agent reasoning

Subagent orchestration

05 — BenchmarksThe numbers, read honestly.

Terminal-Bench 2.1 · OpenAI-reported coding scores

06 — Cyber & SafetyStronger cyber capability, stronger safeguards.

Model-level refusals

Real-time classifiers

Account-level review

Automated red-teaming

07 — The Gated RolloutWhy you can’t use it yet.

08 — What It MeansHow a team should route across the tiers.

Hard, long-horizon work

Today’s GPT-5.5 workloads

High-volume, lower-stakes jobs

Access and concentration risk

09 — ConclusionA range, not a single model.

OpenAI is productising a range and pushing multi-agent reasoning into the mainstream.

Make a three-tier model family a deliberate routing strategy.

Agentic automation engagements

The questions we get every week.

Continue exploring frontier releases.

OpenAI and Oracle Universal Credits: Enterprise Readout

GPT-5.5 Complete Guide: Thinking, Pro & 1M Context

GPT-5.3-Codex-Spark: 1,000 Tok/s Real-Time Coding

GPT-5.3 Codex: Features, Benchmarks, and Migration Guide

Do Not Single-Source Your AI: A Second-Source Playbook

AI Agent Memory 2026: Vector, Graph, Episodic Update

GPT-5.6: Sol, Terra, and Luna

01 — What ShippedOne generation, three durable tiers.

02 — The Three TiersWhat each model is for.

The flagship

The balanced tier

The volume tier

03 — Pricing & CachingThe numbers that move a decision.

Cached-input discount

Write premium

Minimum retention

04 — What's Newmax effort and an ultra multi-agent mode.

Deeper single-agent reasoning

Subagent orchestration

05 — BenchmarksThe numbers, read honestly.

Terminal-Bench 2.1 · OpenAI-reported coding scores

06 — Cyber & SafetyStronger cyber capability, stronger safeguards.

Model-level refusals

Real-time classifiers

Account-level review

Automated red-teaming

07 — The Gated RolloutWhy you can’t use it yet.

08 — What It MeansHow a team should route across the tiers.

Hard, long-horizon work

Today’s GPT-5.5 workloads

High-volume, lower-stakes jobs

Access and concentration risk

09 — ConclusionA range, not a single model.

OpenAI is productising a range and pushing multi-agent reasoning into the mainstream.

Make a three-tier model family a deliberate routing strategy.

Agentic automation engagements

The questions we get every week.

Continue exploring frontier releases.

OpenAI and Oracle Universal Credits: Enterprise Readout

GPT-5.5 Complete Guide: Thinking, Pro & 1M Context

GPT-5.3-Codex-Spark: 1,000 Tok/s Real-Time Coding

GPT-5.3 Codex: Features, Benchmarks, and Migration Guide

Do Not Single-Source Your AI: A Second-Source Playbook

AI Agent Memory 2026: Vector, Graph, Episodic Update

04 — What's New`max` effort and an ultra multi-agent mode.