AI DevelopmentPricing Tracker9 min readPublished June 4, 2026

Five models in ten days · 1M context at every tier · $0.20 to $5.00 / 1M input

OpenRouter June 2026: New Models, Pricing and Rankings

In the ten days from May 27 to June 4, 2026, OpenRouter listed five major new models — Claude Opus 4.8, Step 3.7 Flash, MiniMax M3, Qwen3.7-Plus, and NVIDIA Nemotron 3 Ultra. Read together, the wave spans roughly a 25x price range and shows that every tier now has a capable million-token-context option. Here is the pricing, the context windows, and the usage rankings — with the vendor-claim caveats kept honest.

DA
Digital Applied Team
Senior strategists · Published June 4, 2026
PublishedJune 4, 2026
Read time9 min
Sources10 primary + vendor pages
New models listed
5
May 27 – Jun 4, 2026
MiniMax M3 input
$0.30/M
promo · ~6% of Opus 4.8
vs $5.00
OpenRouter weekly tokens
25T
up from 5T six months prior
5x
Chinese-model share
45%+
of traffic by token volume
from <2% in 2024

OpenRouter added five major models in the ten days from May 27 to June 4, 2026 — a single window dense enough to read as a market signal rather than a scattered news cycle. The list spans frontier closed weights and open-weight challengers, from Claude Opus 4.8 at $5 per million input tokens down to MiniMax M3 at $0.30 on a launch promotion.

What makes the wave worth cataloging is not any single launch. It is the shape of the whole. Every tier of the price ladder now has a credible million-token-context option, and the cheapest capable long-context models cost a small fraction of the most expensive frontier ones. That is a structural change in how teams should think about model selection, not a temporary dislocation.

This guide catalogs the five additions with their OpenRouter pricing and context windows, builds a price-tier ladder for the platform as it stood in early June, separates real-time usage rankings from quality benchmarks, and lays out a practical routing posture. Vendor self-reported numbers are marked as such throughout — a launch claim is not an independent benchmark.

Key takeaways
  1. 01
    Five major models listed in ten days.Claude Opus 4.8 and Opus 4.8 Fast (May 27), Step 3.7 Flash (May 28), MiniMax M3 (May 31 / June 1), Qwen3.7-Plus (June 3), and NVIDIA Nemotron 3 Ultra (June 4). Closed-frontier and open-weight tiers both moved.
  2. 02
    MiniMax M3 is the headline price story.1M-token context with frontier-aimed coding at a $0.30/M input promo rate — roughly 6% of Opus 4.8's $5.00. Weights were promised on Hugging Face within about ten days of the June 1 launch; it is API-available now, not yet downloadable.
  3. 03
    Every price tier now has a 1M-context option.From sub-$0.10 reference rates up to $5.00 for frontier closed models, the ladder spans a roughly 25x range across the wave — and long-context capability is no longer the preserve of the most expensive tier.
  4. 04
    Usage rankings are not quality rankings.OpenRouter's programming collection ranks by real-time token volume, not benchmark quality. MiniMax M3 ranking near the top reflects adoption and price, not a verified quality endorsement.
  5. 05
    The market has split into dollars and tokens.Chinese open-weight models account for more than 45% of OpenRouter traffic by token volume, while Anthropic holds about 12.3% of tokens but a much higher dollar share via premium pricing. Commodity and premium inference now coexist as distinct lanes.

01The WaveFive launches in a single ten-day window.

Treat the calendar as the story. Claude Opus 4.8 and its Opus 4.8 Fast variant landed on OpenRouter on May 27. Step 3.7 Flash followed on May 28. MiniMax M3 reached the API on May 31 with broader availability on June 1. Qwen3.7-Plus listed on June 3, and NVIDIA launched Nemotron 3 Ultra on June 4, the day this guide publishes, after announcing it at Computex in Taipei on June 1.

Most coverage treats each of these in isolation. Stacked into one window, they describe a market where new frontier-aimed capability arrives roughly every other day and where open-weight challengers now ship alongside closed-frontier releases rather than trailing them by months. The practical consequence for any team running inference at scale is that a model-selection decision made in early May was already stale by early June.

May 27 · Anthropic
Claude Opus 4.8
$5 / $25 per Mtok · 1M context · 128K max output

The frontier closed-weight anchor of the wave. A Fast variant lists at $10 / $50 per Mtok — double the cost for higher throughput speed, same capabilities. Developers strongly favor the standard-speed version on OpenRouter's usage figures.

openrouter.ai/anthropic/claude-opus-4.8
May 28 · StepFun
Step 3.7 Flash
$0.20 / $1.15 per Mtok · 256K context

A 196B-parameter multimodal MoE with roughly 11B active per token, native image and video input, and selectable reasoning levels. The cheapest input rate in the wave. Full technical breakdown in our dedicated Step 3.7 Flash post.

openrouter.ai/stepfun/step-3.7-flash
Jun 3 · Alibaba
Qwen3.7-Plus
$0.40 / $1.60 per Mtok · 1M context · closed

A multimodal sibling to Qwen3.7-Max, adding text, image, and video input for visual scene understanding, screen reading, and GUI interaction. Closed and proprietary — not an open-weight release, despite arriving in the same wave as several open models.

openrouter.ai/qwen/qwen3.7-plus
Pricing discipline
Two figures in this window are deliberately left soft. MiniMax M3 has not disclosed a total parameter count, so this guide does not cite one. NVIDIA Nemotron 3 Ultra launched on June 4 with its OpenRouter per-token price unconfirmed at publication — we reference the earlier Nemotron 3 Super at $0.09 / $0.45 per Mtok as the family bracket rather than invent an Ultra number.

02The PlatformWhy these launches converge on OpenRouter.

The wave is not a coincidence of timing alone; it reflects where inference demand now concentrates. OpenRouter raised a $113M Series B on May 26, 2026, led by CapitalG with participation from NVentures, ServiceNow Ventures, and others alongside existing investors Andreessen Horowitz and Menlo Ventures. The round more than doubled the company's valuation to roughly $1.3 billion in a single year.

The usage numbers explain the investor interest. Weekly token throughput has grown from about 5 trillion to 25 trillion over six months — a 5x increase — with the platform reporting more than 8 million developers building across 400-plus models. For a model provider, listing on OpenRouter is now one of the fastest routes to distribution, which is precisely why frontier and challenger labs alike push new releases there on day one.

Running inference at scale is fundamentally a multimodel problem. The era of picking a single model is over. Success now depends on continuously routing across a changing market.— Alex Atallah, Co-founder & CEO, OpenRouter (Series B announcement)
Enterprise context
Deloitte data cited around the Series B reports that 67% of enterprise companies already process close to one billion tokens monthly, with consumption expected to accelerate as multimodal AI agents proliferate. That demand profile — high-volume, multimodal, cost-sensitive — is exactly what a routing layer is built to serve, and it shapes which models providers prioritize.

03The Wave TableThe five models, side by side.

The table below collects every model in the wave with its OpenRouter list date, pricing, context window, and open-versus-closed status — the canonical reference for the June window. Prices are per million tokens, read from OpenRouter model pages on June 4, 2026. Where a figure is a launch promotion or unconfirmed, the cell says so.

Model · listed
Claude Opus 4.8 May 27
Pricing & context
$5 / $25 per Mtok · 1M ctx · 128K out
Open / multimodal
Closed weights · text. Frontier anchor; the Fast variant doubles price for throughput speed.
Model · listed
Step 3.7 Flash May 28
Pricing & context
$0.20 / $1.15 per Mtok · 256K ctx
Open / multimodal
Open weights · multimodal (image + video). 196B MoE, ~11B active. Cheapest input in the wave.
Model · listed
MiniMax M3 May 31 / Jun 1
Pricing & context
$0.30 / $1.20 promo (list $0.60 / $2.40) · 1M ctx · 512K out
Open / multimodal
Open-weight committed (HF pending ~Jun 11) · multimodal. Frontier-aimed coding at a sub-dollar rate.
Model · listed
Qwen3.7-Plus Jun 3
Pricing & context
$0.40 / $1.60 per Mtok · 1M ctx · 65.5K out
Open / multimodal
Closed / proprietary · multimodal (image + video). Vision-capable sibling to text-only Qwen3.7-Max.
Model · listed
Nemotron 3 Ultra Jun 4
Pricing & context
OpenRouter price TBC · 1M ctx · 550B / 55B active
Open / multimodal
Open weights (Hugging Face) · agentic focus. Hybrid Mamba-2 + Transformer + MoE. Super tier brackets pricing at $0.09 / $0.45.

Read the third column carefully. Three of the five additions are open-weight or open-weight-committed releases, and four of the five carry a million-token context window. The most expensive model in the wave is roughly 25 times the input price of the cheapest — and the cheapest is still a capable multimodal model. That spread, at this density of launches, is the structural argument of this post.

04MiniMax M3The launch that compresses the price floor.

MiniMax M3 is the model that gives the wave its narrative. MiniMax positions it as the first open-weight model to combine frontier-level coding, a 1M-token context window, and native multimodal input across text, image, and video. Its architecture, MiniMax Sparse Attention, replaces full attention with KV-block selection — the same family of long-context efficiency tricks the wider open-weight field has converged on. For the full technical treatment, see our MiniMax M3 deep-dive.

Two cautions belong up front. M3 is API-available now, but its open weights were promised on Hugging Face and GitHub within roughly ten days of the June 1 launch — they are not downloadable as of this writing. And its benchmark figures are vendor self-reported: MiniMax ran them on its own infrastructure with its own scaffolding. Treat the headline scores as a launch claim, not an independent result.

Input price (promo)
Roughly 6% of Opus 4.8
$0.30/M

At the $0.30 / $1.20 promotional rate, a task consuming about 500K input plus 100K output tokens costs roughly $0.27, versus about $5.00 on Opus 4.8 standard. At MiniMax's standard $0.60 / $2.40 rate the gap narrows to around 9x. Promo expires; verify the live rate.

List: $0.60 / $2.40
Vendor SWE-Bench Pro
Self-reported, with a caveat
59.0%

MiniMax claims 59.0% on SWE-Bench Pro — ahead of GPT-5.5's 58.6% by its own measure, behind Claude Opus 4.7's 64.3%. All scores were run on MiniMax infrastructure with MiniMax scaffolding, so read them as vendor claims pending independent replication.

⚠️ Vendor-reported
Parameter count
Undisclosed by MiniMax

MiniMax has not published a total parameter count for M3, and the MSA efficiency material does not state one. We do not cite a number; any total-parameter figure for M3 circulating elsewhere is not sourced from MiniMax.

Not published

The honest counterweight to the vendor numbers comes from independent testing. In an independent multi-model benchmark published June 1, 2026, MiniMax M3 scored 78 out of 100 — a solid Tier B — while Claude Opus 4.8 landed at 95 (Tier A) and GPT-5.5 and GPT-5.4 sat at 96 to 97. M3 is genuinely strong and genuinely cheap; it is not, on independent evidence, on par with Opus 4.8 for quality. Both things are true at once, and the routing implication is to match the model to the task rather than to the headline price.

With Opus at $1.10 delivering Tier A out of the gate, the cheaper models only win on paper.— Flávio Copes, independent June 2026 LLM benchmark

05Price LadderA full price ladder with capability at every rung.

Zoom out from the wave to OpenRouter's broader price ladder as it stood in early June. The bars below chart input price per million tokens across representative models, from sub-$0.10 reference rates up to the $5.00 frontier tier. Longer bars are more expensive. The point is not the ordering — it is that capable, long-context options now exist at nearly every rung.

Input price per 1M tokens · representative OpenRouter models

Source: OpenRouter model pages, retrieved June 4, 2026
Nemotron 3 Super120B / 12B active · 1M ctx · reference bracket
$0.09
DeepSeek V4 Flashopen weights · long-context efficient
$0.098
Step 3.7 Flash196B MoE · 256K ctx · multimodal
$0.20
MiniMax M3 (promo)1M ctx · open-weight committed · multimodal
$0.30
Qwen3.7-Plus1M ctx · closed · multimodal
$0.40
Qwen3.7-Maxtext-only deep reasoning · closed
$2.50
Claude Opus 4.81M ctx · frontier closed weights
$5.00

The takeaway is structural. A year ago, the cheap end of the ladder meant short context windows and modest capability; the long-context frontier meant premium pricing. The June ladder breaks that coupling. Step 3.7 Flash, MiniMax M3, Qwen3.7-Plus, and the Nemotron family all sit at or below $0.40 per million input tokens while carrying 256K-to-1M context windows. The decision a team faces is no longer "can we afford long context" but "which long-context model fits this specific workload at this price." Our AI transformation engagements start with exactly that comparative eval.

06Usage RankingsWhat the programming rankings do and don't measure.

OpenRouter publishes a programming collection ranked by usage. In early June, its top coding models by volume ran roughly: MiMo V2.5, MiniMax M3, DeepSeek V4 Flash, MiMo V2.5 Pro, Hy3 Preview, DeepSeek V4 Pro, Claude Opus 4.7, Owl Alpha, Step 3.7 Flash, and Claude Sonnet 4.6. The critical caveat: these are real-time usage rankings, not quality benchmarks. MiniMax M3 ranking near the top reflects adoption and price, not a verified quality endorsement.

Usage rankings also move fast, and history is instructive. Tencent's Hy3 preview held OpenRouter's number-one overall usage ranking from April 27 to May 11, accumulating about 7.7 trillion tokens in 19 days at $0.063 per million input. By early June it had slipped down the programming collection. Independent analysis read the steady usage after Hy3 moved to a paid tier as organic demand from a single large application rather than promotional uptake — a reminder that a ranking can reflect one heavy consumer as much as broad preference.

Read rankings carefully
A high usage rank on OpenRouter means a model is being used a lot right now — often because it is cheap, recently launched, or favored by one high-volume app. It is not evidence that the model is the highest quality for your task. For that, run your own eval on your own prompts. For market-share and baseline-volume context, our OpenRouter April 2026 rankings set the pre-wave baseline.

07The SplitDollars versus tokens — a permanent bifurcation.

The most important pattern in the data is not any single model. It is the split between the volume market and the dollar market. Chinese-origin models — from Xiaomi, Alibaba, MiniMax, DeepSeek, and Moonshot — now account for more than 45% of all OpenRouter traffic by token volume, up from below 2% a year ago. Anthropic, by contrast, holds around 12.3% of token share but retains a much higher dollar share through premium pricing. OpenRouter now routes both commodity inference and premium inference as distinct lanes.

Independent market analysis frames the dynamic bluntly: the market is following price, not quality. Sub-dollar blended pricing from open-weight challengers has captured disproportionate token volume, while premium closed models dominate dollar spend despite lower token share. Inference volume across the tracked market grew roughly 11.1x year over year. This is the interpretive core of the wave — not a temporary dislocation but a durable structure that should change how developers reason about model selection.

The dollar-vs-token split · OpenRouter traffic, mid-2026

Source: CodeSOTA market trends; TechTimes, May 2026
Chinese-origin modelsshare of OpenRouter traffic by token volume
>45%
Anthropictoken share — but a much higher dollar share
12.3%
Inference volume growthyear over year, tracked market through Apr 2026
11.1x

Projecting forward, the bifurcation looks structural rather than cyclical. If sub-dollar long-context models keep arriving every few days — as the ten-day wave suggests they will — the volume lane will keep commoditizing while the dollar lane consolidates around the handful of frontier models worth a premium for the hardest work. The strategic move for most teams is to stop treating "which model" as a single decision and start treating it as a routing policy that differs by task class. That is exactly the multi-model thesis behind OpenRouter Fusion, the synthesis layer that makes multi-model routing practical.

08How To RouteA practical routing posture for the June wave.

The wave does not crown a single winner; it rewards deliberate routing. Below is a workload-by-workload posture grounded only in the evidence above — independent benchmarks where they exist, vendor claims marked as such, and price read live from OpenRouter.

Highest-quality reasoning
Hard, high-stakes tasks

Independent benchmarks put Opus 4.8 (Tier A) and the GPT-5.5 / 5.4 line at the top of the wave. When a wrong answer is expensive, the premium tier earns its dollar share. Verify on your own prompts before committing defaults.

Pick frontier closed
Cost-sensitive long context
High-volume long-context work

MiniMax M3 at $0.30/M promo offers 1M context and strong coding for a fraction of frontier cost. Sound for high-volume, lower-stakes workloads — but it is Tier B on independent testing, so keep a quality fallback for hard cases.

Pick MiniMax M3 (with fallback)
Multimodal & vision
Image, video, GUI tasks

Step 3.7 Flash and Qwen3.7-Plus both add native vision at sub-$0.40 input. Qwen3.7-Plus is closed; Step 3.7 Flash ships open weights. Choose on the open-vs-closed axis your deployment requires, not on headline price alone.

Route by open/closed need
Sovereign / on-prem agentic
Open-weight autonomous agents

NVIDIA Nemotron 3 Ultra ships open weights with an explicitly agentic training focus and 1M context. OpenRouter pricing was unconfirmed at launch — bracket cost via Nemotron 3 Super and confirm before production.

Evaluate Nemotron 3 Ultra

For most agencies and engineering teams, the starting point is not a switch but a measurement: benchmark two or three candidates on your own prompts, measure token spend and latency, and decide per workload. The wave makes that discipline more valuable, not less, because the cheapest capable option changes month to month. Treat the price ladder as a menu and your routing policy as the thing you actually maintain. Sibling references for the individual models — Step 3.7 Flash, Qwen 3.7 Plus, and Nemotron 3 Super — carry the per-model detail this roundup keeps brief.

09ConclusionThe week the price floor moved.

The shape of the model market, June 2026

Model selection is now a routing policy, not a single decision.

Five major models in ten days is not noise. Read together, the OpenRouter wave of May 27 to June 4 shows a market where frontier-aimed capability arrives every other day, where open-weight challengers ship alongside closed-frontier releases rather than behind them, and where every price tier now carries a credible million-token-context option.

The honest framing keeps two facts in view at once. MiniMax M3 is genuinely cheap and genuinely capable at $0.30 per million input tokens — and it is Tier B on independent testing, not a drop-in replacement for the Tier A frontier. The cheapest capable model and the highest-quality model are increasingly different models, and the gap between them is now a routing decision rather than a budget constraint.

The durable signal is the split itself: a commodity lane of high-volume open-weight inference and a premium lane of low-volume frontier work, both routed through the same platform. The teams that win the next year will not pick a model; they will maintain a policy — matching task class to the cheapest model that clears the quality bar, and re-checking that match as the price floor keeps moving.

Build a model-routing policy that pays for itself

Stop picking one model. Build a policy that routes by task and price.

Our team helps businesses evaluate, benchmark, and route across the model market — open-weight and closed frontier alike — so you pay for quality only where it earns its premium and commoditize everything else.

Free consultationExpert guidanceTailored solutions
What we work on

Multi-model routing engagements

  • Benchmarking new releases on your own prompts and corpus
  • Cost-vs-quality routing policy across open + closed models
  • Long-context RAG on sub-dollar open-weight models
  • Vendor-claim verification against independent benchmarks
  • Spend governance for high-volume inference workloads
FAQ · OpenRouter June 2026

The questions we get every week.

Five major models listed in that ten-day window. Claude Opus 4.8 and its Opus 4.8 Fast variant arrived on May 27. Step 3.7 Flash followed on May 28. MiniMax M3 reached the API on May 31 with broader availability June 1. Qwen3.7-Plus listed on June 3, and NVIDIA Nemotron 3 Ultra launched on June 4 after a Computex announcement on June 1. The set spans both closed-frontier and open-weight tiers, and four of the five carry a million-token context window — which is the reason the window reads as a single market signal rather than five unrelated launches.