AI Model Efficient Frontier Q2 2026: Performance vs Price
Q2 2026 efficient-frontier analysis — Pareto scatter plots mapping speed, quality, and cost across 20 frontier models. Identifies the dominant strategies.
Models analyzed
Pareto-dominant
Price spread
Timeframe
Key Takeaways
The efficient frontier says every dominated model should be discarded. Half the top-10 on OpenRouter is dominated on every dimension by a cheaper sibling. Pareto optimality is a business decision — and most agencies get it wrong.
This post is a Q2 2026 snapshot of the AI model landscape viewed through a single lens: which models are not beaten on every axis at once by a competitor, and which are. It uses real pricing from the Q2 2026 LLM pricing index, real throughput numbers from Artificial Analysis and OpenRouter, and real quality scores from the Intelligence Index. No invented benchmarks. No vendor-curated charts.
Methodology note: This analysis is a snapshot as of April 12, 2026. Model releases in Q2 will shift the scatter. Treat the frontier as a routing map, not a ranking — the right model for your workload is not necessarily on the frontier if migration cost, data residency, or procurement rules dominate.
What an efficient frontier actually is
An efficient frontier, in the Pareto sense, is the set of options where you cannot improve one dimension without getting worse on another. In portfolio theory the axes are expected return and variance. In AI model selection the axes are quality, cost per token, and throughput. A model is on the frontier if no competitor strictly beats it on every axis simultaneously. A model is dominated if some alternative is at least as good on every axis and strictly better on at least one.
The interesting property of the frontier is its sparsity. With 20 models across three axes, a naive expectation is that most of them sit somewhere on the curve. Reality is the opposite — the frontier collapses to a handful of dominant points and a long tail of models beaten on every dimension. The frontier is not a line, it is a cluster.
Claude Opus 4.5 (Reasoning) scores 49.7 on the Intelligence Index. It costs roughly $15/$75 per 1M tokens on Anthropic's direct API. Claude Sonnet 4.6 (Max Effort) scores 51.7 on the same index at $3/$15 per 1M tokens. Sonnet 4.6 is better on quality and cheaper on price — so Opus 4.5 is strictly dominated. There is no axis on which Opus 4.5 wins.
That does not mean nobody should use Opus 4.5 — migration cost is real, existing pipelines are tuned to it, and the newer Opus 4.6 already replaced it at the top of the Anthropic stack. But in a greenfield deployment as of Q2 2026, choosing Opus 4.5 over Sonnet 4.6 is not a defensible choice on any axis.
The three axes: speed, quality, cost
Every Pareto analysis depends on its axes. For this one we use three measurable, comparable quantities. Each has caveats.
Artificial Analysis composite score (0-100) blending MMLU-Pro, GPQA Diamond, AIME, LiveCodeBench, SciCode, HumanEval, MATH-500, and IFEval. Reflects general reasoning ability. Domain-specific rankings differ.
Blended input+output price per 1M tokens at a 3:1 input-to-output ratio. OpenRouter pricing for open-weight models, direct API pricing for frontier closed models. Excludes cached-input discounts.
Median output throughput measured on the fastest generally available inference provider per model. Cerebras, Groq, and SambaNova reshape the numbers for open-weight models; closed frontier models are capped by their vendor.
These three axes are correlated but not collinear. Cheap models often score lower on quality, but not always. Fast hardware improves throughput without changing quality. Expensive reasoning models are slow by design because they generate more tokens per answer. The correlations are loose enough that the three-axis Pareto analysis produces a genuinely short list of dominant models.
Pick the right axis for your workload. A customer support chatbot weights latency higher than raw IQ. A legal contract reviewer weights quality higher than cost. Our AI digital transformation engagements begin with a workload audit that names the axes before selecting a model stack.
Q2 2026 scatter: cost vs quality
Twenty frontier and near-frontier models plotted against blended cost and the Artificial Analysis Intelligence Index. Prices are quoted on a $/1M-token basis at a 3:1 input-to-output blend. The final column marks whether the model is on the cost-vs-quality Pareto frontier for Q2 2026.
| Model | Provider | Blended $/1M | Quality score | Frontier? |
|---|---|---|---|---|
| GPT-5.4 (xhigh) | OpenAI | $5.63 | 57.2 | Yes |
| Gemini 3.1 Pro Preview | $4.50 | 57.2 | Yes | |
| Claude Opus 4.6 (Max Effort) | Anthropic | $10.00 | 53.0 | Yes |
| Claude Sonnet 4.6 (Max Effort) | Anthropic | $6.00 | 51.7 | Yes |
| GPT-5.2 (xhigh) | OpenAI | $6.50 | 51.3 | No — dominated by Sonnet 4.6 |
| GLM-5 (Reasoning) | Z.ai | $1.24 | 49.8 | No — dominated by MiniMax M2.7 |
| Claude Opus 4.5 (Reasoning) | Anthropic | $30.00 | 49.7 | No — dominated by Sonnet 4.6 |
| MiniMax M2.7 | MiniMax | $0.53 | 49.6 | Yes |
| MiMo V2 Pro | Xiaomi | $1.50 | 49.2 | Yes |
| GPT-5.4 Mini | OpenAI | $1.69 | ~46 | No — dominated by MiMo V2 Pro |
| Nemotron 3 Super 120B | NVIDIA | $0.10 | ~45 | Yes |
| Kimi K2.5 | Moonshot | $0.72 | ~44 | No — dominated by Nemotron 3 Super |
| Qwen 3.6 Plus Preview | Alibaba | Free | ~44 | Yes |
| Qwen 3 Max Thinking | Alibaba | $1.56 | ~43 | No — dominated by MiMo V2 Pro |
| MiniMax M2.5 | MiniMax | $0.34 | ~42 | No — dominated by Nemotron 3 Super |
| DeepSeek V3.2 | DeepSeek | $0.42 | ~41 | No — dominated by Nemotron 3 Super |
| Gemini 3 Flash Preview | $0.56 | ~40 | No — dominated by Qwen 3.6 Plus (free) | |
| Grok 4.20 | xAI | $3.00 | ~47 | No — dominated by Sonnet 4.6 on quality |
| Step 3.5 Flash | StepFun | Free tier | ~38 | No — dominated by Qwen 3.6 Plus on quality |
| GPT-5.4 Pro | OpenAI | $67.50 | ~58 | Yes (top) |
Reading the table: six models sit on the cost-vs-quality frontier — GPT-5.4 Pro at the absolute top, GPT-5.4 and Gemini 3.1 Pro tied below it, Opus 4.6 and Sonnet 4.6 in the premium reasoning band, MiniMax M2.7 and MiMo V2 Pro in the mid-quality mid-cost band, Nemotron 3 Super in the open-weight band, and Qwen 3.6 Plus as the free-tier anchor. Every other model in the top 20 is beaten on both axes at once by one of those six.
Quality scores marked with ~ are Digital Applied estimates triangulated from domain benchmarks where an Intelligence Index value was not published. Treat them as order-of-magnitude signals, not precise rankings.
Q2 2026 scatter: cost vs speed
The second scatter plots cost against throughput, measured as output tokens per second on the fastest generally available provider for each model. Cerebras, Groq, SambaNova, and DeepInfra matter here — they reshape the frontier for open-weight models by offering hardware-differentiated speed at low prices.
| Model | Fastest on | Throughput | Blended $/1M | Frontier? |
|---|---|---|---|---|
| gpt-oss-120b | Cerebras | 920 tok/s | $0.35 | Yes (fastest) |
| gpt-oss-20b | Groq | 714 tok/s | $0.07 | Yes |
| Qwen 3 32B | Groq | 423 tok/s | $0.29 | Yes |
| Llama 3.1 8B | Groq | 274 tok/s | $0.05 | Yes (cheapest fast) |
| Kimi K2.5 | ModelRun | 198 tok/s | $0.55 | No — dominated by Qwen 3 32B |
| MiniMax M2.5 | SambaNova | 175 tok/s | $0.30 | No — dominated by Qwen 3 32B |
| Nemotron 3 Super | DeepInfra | 174 tok/s | $0.10 | Yes |
| Trinity Mini | Clarifai | 170 tok/s | $0.04 | Yes |
| R1 0528 | Nebius | 156 tok/s | $2.00 | No — dominated by Nemotron 3 Super |
| Claude Sonnet 4.6 | Anthropic direct | ~75 tok/s | $6.00 | No — dominated on cost and speed |
| Claude Opus 4.6 | Anthropic direct | ~60 tok/s | $10.00 | No on speed axis — Yes on quality |
| GPT-5.4 | OpenAI direct | ~110 tok/s | $5.63 | No — dominated by gpt-oss-120b on cost and speed |
The cost-vs-speed frontier is owned by hardware rather than model architecture. Cerebras wins the top of the curve with gpt-oss-120b at 920 tok/s. Groq dominates the mid-curve with gpt-oss-20b at 714 tok/s and $0.07/1M. Claude and OpenAI's frontier models are not on this frontier — they are on the quality frontier, not the speed one.
Dominant models on the frontier
Six models survive both scatters as Q2 2026 Pareto-dominant choices. Each owns a distinct position.
Qwen 3.6 Plus Preview (free)
The free-tier anchor. Alibaba's preview release is priced at zero on OpenRouter with a 1M-token context window and always-on chain-of-thought. Quality sits in the ~44 range on the Intelligence Index. Because it is free, nothing with a quality score below 44 can justify a per-token price. The entire bottom of the scatter collapses against this point.
Best for: bulk classification, synthetic data generation, prompt exploration, and any workload where quality above 44 is not required.
Nemotron 3 Super 120B
The open-weight frontier. NVIDIA's 120B/12B-active Mamba- Transformer MoE scores 60.47% on SWE-Bench Verified and runs at 174 tok/s on DeepInfra for $0.10 blended. On the cost-vs-quality scatter it sits at the inflection where open-weight economics meet reasonable quality. No open-weight model beats it on every axis at once.
Best for: self-hosted deployments, regulated environments requiring on-prem inference, and workloads where data residency rules prohibit closed-weight APIs.
MiniMax M2.7
The self-evolving mid-curve. Ten billion active parameters, 56.22% on SWE-Bench Pro, 57.0% on Terminal Bench 2, and $0.30/$1.20 per 1M. Sits at the sweet spot where quality of 49.6 meets $0.53 blended cost — a position Opus charges 50x more for. The Opus-class-quality-at-Xiaomi-prices story.
Best for: agent workloads, code generation at scale, and any budget-conscious deployment that needs near-premium reasoning.
MiMo V2 Pro
The volume champion. Xiaomi's 1T+ parameter model at $1/$3 per 1M carries 4.79T weekly tokens on OpenRouter — a 3x lead over second place and 25.5% of all coding tokens. Quality sits at 49.2 on the Intelligence Index. Not the best at anything; on the frontier because of sheer cost-per-quality efficiency at its tier.
Best for: high-volume coding assistants, content generation pipelines, and any workload where stability at scale matters more than peak quality.
Claude Sonnet 4.6
The premium workhorse. $3/$15 per 1M, 79.6% on SWE-Bench Verified, Intelligence Index 51.7 at Max Effort, 200K context with 1M beta. At roughly one-fifth of Opus pricing with near-Opus quality, Sonnet 4.6 dominates Opus 4.5 outright and competes with GPT-5.4 on every axis except raw speed. The default reasoning model for most production workloads.
Best for: production RAG pipelines, agentic workflows, and any workload where consistency matters more than cost-per-token.
Claude Opus 4.6
The quality ceiling for reasoning-heavy tasks. $5/$25 per 1M on OpenRouter, Intelligence Index 53.0 at Max Effort, 200K context with 1M beta. Sits on the frontier because no cheaper model beats it on quality and no higher-quality model beats it on price except GPT-5.4 Pro (at 7x the cost) and tied GPT-5.4 and Gemini 3.1 Pro.
Best for: irreducibly hard reasoning, long-horizon agents, legal analysis, and any workload where the cost of a wrong answer exceeds the cost of the token spend.
Dominated models off the frontier
The honest list. These models still ship real production traffic, but in a greenfield Q2 2026 deployment they are beaten on every axis at once by one of the six frontier models above. Domination does not mean "bad" — it means "not the right default."
| Model | Dominated by | Why |
|---|---|---|
| Claude Opus 4.5 | Sonnet 4.6 | Lower quality, higher price, slower. |
| GPT-5.2 xhigh | Sonnet 4.6 | Similar quality, higher price, smaller ecosystem gains. |
| Qwen 3 Max Thinking | MiMo V2 Pro | Similar quality tier at higher blended cost. |
| Kimi K2.5 | MiniMax M2.7 / Nemotron 3 Super | Higher cost at comparable quality, slower than Nemotron. |
| MiniMax M2.5 | MiniMax M2.7 | Older sibling, superseded on both price and quality. |
| DeepSeek V3.2 | Nemotron 3 Super | Higher cost at lower SWE-Bench, slower inference. |
| Gemini 3 Flash Preview | Qwen 3.6 Plus (free) | Paid tier at quality the free model matches. |
| Step 3.5 Flash | Qwen 3.6 Plus (free) | Both free; Qwen 3.6 scores higher on quality. |
| GPT-5.4 Mini | MiMo V2 Pro | Similar quality tier at higher blended cost. |
| Grok 4.20 | Sonnet 4.6 | Lower benchmark scores, comparable price, smaller tool ecosystem. |
The dominated list matters because procurement inertia is strongest for exactly these models. Grok 4.20 ships because xAI has distribution. GPT-5.4 Mini ships because OpenAI has name recognition. DeepSeek V3.2 ships because it was the right answer six months ago. None of them are the right Pareto answer now.
The free-tier floor
Q2 2026 is the first period where genuinely capable free models exist in meaningful numbers. Qwen 3.6 Plus Preview is free during its preview window. Step 3.5 Flash has a free tier. Nemotron 3 Super has a free tier on NVIDIA's own inference endpoints. OpenAI gpt-oss-120b is Apache 2.0. The floor of the scatter is compressing against zero.
The business implication is counterintuitive. A paid model priced at $0.20 per 1M tokens does not compete with other $0.20 models — it competes with free. If it does not beat the best free model on quality, it has no niche. That condition is strict; it eliminates most of the sub-$0.50 paid tier from serious consideration on greenfield deployments.
- Throughput guarantees: paid tiers usually unlock higher RPS limits and priority queues.
- Data handling: paid tiers often promise zero-retention, no-training, or region-specific inference.
- SLAs: paid tiers carry uptime commitments that free tiers explicitly disclaim.
- Larger context windows: some free tiers cap context at 32K while the paid version hits 1M.
- Tool-use and file uploads: advanced features (function calling, vision, file attachments) are often gated to paid.
Don't confuse free with risk-free. Preview tiers change pricing with little notice, and "free during preview" is not a procurement-grade SLA. Our CRM automation stacks treat free-tier models as a fallback layer, not the primary, for workloads that touch customer data.
Strategic positioning per provider
The scatter reveals each provider's Q2 2026 bet. Every major lab has made a choice about which frontier quadrant they intend to own. Those choices are legible.
Opus at $5/$25, Sonnet at $3/$15. Premium pricing, premium quality, no attempt to compete at the low end. The bet is that enterprise reasoning workloads are price-inelastic above a quality threshold. Provider share on OpenRouter is 10.9% — below Xiaomi, Alibaba, Google — but concentrated in high-margin workloads.
MiMo V2 Pro at $1/$3 per 1M and 4.79T weekly tokens. The bet is that a mid-quality model at scale produces more revenue than a premium model at enterprise scale. 21.1% OpenRouter share — the largest of any provider. Detailed in Anthropic cost vs Xiaomi volume.
Qwen 3.6 Plus free during preview, Qwen 3.5 Flash at $0.065/$0.26. The bet is that the free tier pulls developers onto Alibaba Cloud infrastructure where the paid upsell happens. 13.9% share and climbing.
GPT-5.4 Pro at $30/$180. GPT-5.4 at $2.50/$15. Even Nano at $0.20/$1.25. The bet is that brand pricing holds even as Chinese models match on quality. 7.5% OpenRouter share signals that the bet is under pressure but profitable.
Nemotron 3 Super at $0.10 blended, open weights, fast on DeepInfra. The bet is that open-weight models anchor developer mindshare and drive H100/B200 hardware demand. The model is not the product; the inference hardware is.
M2.7 at $0.30/$1.20 with a self-evolving training loop. 56% SWE-Pro and 57% Terminal-Bench 2. The bet is that a continuously-improving model at one-tenth of Opus pricing wins agentic workloads where quality-per-dollar dominates. 8.1% OpenRouter share.
Digital Applied model-routing rules
The right application of a Pareto frontier is not model selection — it is model routing. Each workload type maps to a specific model on the frontier. Our production rules as of Q2 2026:
| Workload | Route to | Why |
|---|---|---|
| Irreducible judgment | Claude Opus 4.6 | Top of quality axis; error cost exceeds token cost. |
| Production RAG + agents | Claude Sonnet 4.6 | Near-Opus quality at 1/5 cost; default reasoning model. |
| High-volume code generation | MiMo V2 Pro | Best cost-per-quality at scale; 1M context. |
| Budget agent workloads | MiniMax M2.7 | Opus-class reasoning at $0.53 blended. |
| Bulk classification | Qwen 3.6 Plus (free) | Free floor dominates below the quality-44 band. |
| Interactive UX / voice | gpt-oss-120b on Cerebras | 920 tok/s owns latency-sensitive surface. |
| On-prem / regulated | Nemotron 3 Super 120B | Best open-weight quality; self-hostable. |
A production stack running those seven rules costs less than a single-model stack pinned to any frontier model and produces better unit quality because each workload is served by the model that dominates its axis. The cheapest correct answer is structurally a routing decision, not a shopping decision. For the measurement infrastructure behind this analysis, our analytics and insights engagements track cost-per-correct-answer per route against cost-per-token baselines.
Related analysis. The frontier release velocity index tracks how often the frontier shifts, open-weight vs closed-source dissects the Nemotron / Qwen / gpt-oss story, the context-window arms race covers the 1M-10M-token battlefield, and the OpenRouter April 2026 rankings ground this scatter in live usage data.
Conclusion
Twenty frontier and near-frontier models reduce to six Pareto-dominant choices in Q2 2026. The other fourteen are dominated on at least one axis by a sibling, usually by a cheaper one. That is a strong claim, and it changes fast — the frontier will look different in Q3. The methodology is what matters: define the axes for your workload, place every candidate on the scatter, discard the dominated points, then route workloads against the surviving frontier.
The cheapest correct answer for a production AI stack is almost never a single model. It is a routing rule that sends each workload to the point on the frontier that dominates its axis — Opus for judgment, Sonnet for reasoning, MiMo for volume, MiniMax for budget agents, Qwen for free-tier classification, Cerebras-hosted gpt-oss for interactive UX, Nemotron for on-prem. That is the Pareto-rational default.
Build a model stack that actually dominates
Most production AI stacks pay a premium for a dominated model and call it a day. We build Pareto-rational routing layers that send each workload to the frontier model that fits its axis — and measure cost-per-correct-answer, not cost-per-token.
Frequently Asked Questions
Related Guides
Continue exploring the Q2 2026 AI landscape.