OpenRouter added five major models in the ten days from May 27 to June 4, 2026 — a single window dense enough to read as a market signal rather than a scattered news cycle. The list spans frontier closed weights and open-weight challengers, from Claude Opus 4.8 at $5 per million input tokens down to MiniMax M3 at $0.30 on a launch promotion.
What makes the wave worth cataloging is not any single launch. It is the shape of the whole. Every tier of the price ladder now has a credible million-token-context option, and the cheapest capable long-context models cost a small fraction of the most expensive frontier ones. That is a structural change in how teams should think about model selection, not a temporary dislocation.
This guide catalogs the five additions with their OpenRouter pricing and context windows, builds a price-tier ladder for the platform as it stood in early June, separates real-time usage rankings from quality benchmarks, and lays out a practical routing posture. Vendor self-reported numbers are marked as such throughout — a launch claim is not an independent benchmark.
- 01Five major models listed in ten days.Claude Opus 4.8 and Opus 4.8 Fast (May 27), Step 3.7 Flash (May 28), MiniMax M3 (May 31 / June 1), Qwen3.7-Plus (June 3), and NVIDIA Nemotron 3 Ultra (June 4). Closed-frontier and open-weight tiers both moved.
- 02MiniMax M3 is the headline price story.1M-token context with frontier-aimed coding at a $0.30/M input promo rate — roughly 6% of Opus 4.8's $5.00. Weights were promised on Hugging Face within about ten days of the June 1 launch; it is API-available now, not yet downloadable.
- 03Every price tier now has a 1M-context option.From sub-$0.10 reference rates up to $5.00 for frontier closed models, the ladder spans a roughly 25x range across the wave — and long-context capability is no longer the preserve of the most expensive tier.
- 04Usage rankings are not quality rankings.OpenRouter's programming collection ranks by real-time token volume, not benchmark quality. MiniMax M3 ranking near the top reflects adoption and price, not a verified quality endorsement.
- 05The market has split into dollars and tokens.Chinese open-weight models account for more than 45% of OpenRouter traffic by token volume, while Anthropic holds about 12.3% of tokens but a much higher dollar share via premium pricing. Commodity and premium inference now coexist as distinct lanes.
01 — The WaveFive launches in a single ten-day window.
Treat the calendar as the story. Claude Opus 4.8 and its Opus 4.8 Fast variant landed on OpenRouter on May 27. Step 3.7 Flash followed on May 28. MiniMax M3 reached the API on May 31 with broader availability on June 1. Qwen3.7-Plus listed on June 3, and NVIDIA launched Nemotron 3 Ultra on June 4, the day this guide publishes, after announcing it at Computex in Taipei on June 1.
Most coverage treats each of these in isolation. Stacked into one window, they describe a market where new frontier-aimed capability arrives roughly every other day and where open-weight challengers now ship alongside closed-frontier releases rather than trailing them by months. The practical consequence for any team running inference at scale is that a model-selection decision made in early May was already stale by early June.
Claude Opus 4.8
The frontier closed-weight anchor of the wave. A Fast variant lists at $10 / $50 per Mtok — double the cost for higher throughput speed, same capabilities. Developers strongly favor the standard-speed version on OpenRouter's usage figures.
Step 3.7 Flash
A 196B-parameter multimodal MoE with roughly 11B active per token, native image and video input, and selectable reasoning levels. The cheapest input rate in the wave. Full technical breakdown in our dedicated Step 3.7 Flash post.
Qwen3.7-Plus
A multimodal sibling to Qwen3.7-Max, adding text, image, and video input for visual scene understanding, screen reading, and GUI interaction. Closed and proprietary — not an open-weight release, despite arriving in the same wave as several open models.
02 — The PlatformWhy these launches converge on OpenRouter.
The wave is not a coincidence of timing alone; it reflects where inference demand now concentrates. OpenRouter raised a $113M Series B on May 26, 2026, led by CapitalG with participation from NVentures, ServiceNow Ventures, and others alongside existing investors Andreessen Horowitz and Menlo Ventures. The round more than doubled the company's valuation to roughly $1.3 billion in a single year.
The usage numbers explain the investor interest. Weekly token throughput has grown from about 5 trillion to 25 trillion over six months — a 5x increase — with the platform reporting more than 8 million developers building across 400-plus models. For a model provider, listing on OpenRouter is now one of the fastest routes to distribution, which is precisely why frontier and challenger labs alike push new releases there on day one.
Running inference at scale is fundamentally a multimodel problem. The era of picking a single model is over. Success now depends on continuously routing across a changing market.— Alex Atallah, Co-founder & CEO, OpenRouter (Series B announcement)
03 — The Wave TableThe five models, side by side.
The table below collects every model in the wave with its OpenRouter list date, pricing, context window, and open-versus-closed status — the canonical reference for the June window. Prices are per million tokens, read from OpenRouter model pages on June 4, 2026. Where a figure is a launch promotion or unconfirmed, the cell says so.
| Model · listed | Pricing & context | Open / multimodal |
|---|---|---|
| Claude Opus 4.8 May 27 | $5 / $25 per Mtok · 1M ctx · 128K out | Closed weights · text. Frontier anchor; the Fast variant doubles price for throughput speed. |
| Step 3.7 Flash May 28 | $0.20 / $1.15 per Mtok · 256K ctx | Open weights · multimodal (image + video). 196B MoE, ~11B active. Cheapest input in the wave. |
| MiniMax M3 May 31 / Jun 1 | $0.30 / $1.20 promo (list $0.60 / $2.40) · 1M ctx · 512K out | Open-weight committed (HF pending ~Jun 11) · multimodal. Frontier-aimed coding at a sub-dollar rate. |
| Qwen3.7-Plus Jun 3 | $0.40 / $1.60 per Mtok · 1M ctx · 65.5K out | Closed / proprietary · multimodal (image + video). Vision-capable sibling to text-only Qwen3.7-Max. |
| Nemotron 3 Ultra Jun 4 | OpenRouter price TBC · 1M ctx · 550B / 55B active | Open weights (Hugging Face) · agentic focus. Hybrid Mamba-2 + Transformer + MoE. Super tier brackets pricing at $0.09 / $0.45. |
Read the third column carefully. Three of the five additions are open-weight or open-weight-committed releases, and four of the five carry a million-token context window. The most expensive model in the wave is roughly 25 times the input price of the cheapest — and the cheapest is still a capable multimodal model. That spread, at this density of launches, is the structural argument of this post.
04 — MiniMax M3The launch that compresses the price floor.
MiniMax M3 is the model that gives the wave its narrative. MiniMax positions it as the first open-weight model to combine frontier-level coding, a 1M-token context window, and native multimodal input across text, image, and video. Its architecture, MiniMax Sparse Attention, replaces full attention with KV-block selection — the same family of long-context efficiency tricks the wider open-weight field has converged on. For the full technical treatment, see our MiniMax M3 deep-dive.
Two cautions belong up front. M3 is API-available now, but its open weights were promised on Hugging Face and GitHub within roughly ten days of the June 1 launch — they are not downloadable as of this writing. And its benchmark figures are vendor self-reported: MiniMax ran them on its own infrastructure with its own scaffolding. Treat the headline scores as a launch claim, not an independent result.
Roughly 6% of Opus 4.8
At the $0.30 / $1.20 promotional rate, a task consuming about 500K input plus 100K output tokens costs roughly $0.27, versus about $5.00 on Opus 4.8 standard. At MiniMax's standard $0.60 / $2.40 rate the gap narrows to around 9x. Promo expires; verify the live rate.
Self-reported, with a caveat
MiniMax claims 59.0% on SWE-Bench Pro — ahead of GPT-5.5's 58.6% by its own measure, behind Claude Opus 4.7's 64.3%. All scores were run on MiniMax infrastructure with MiniMax scaffolding, so read them as vendor claims pending independent replication.
Undisclosed by MiniMax
MiniMax has not published a total parameter count for M3, and the MSA efficiency material does not state one. We do not cite a number; any total-parameter figure for M3 circulating elsewhere is not sourced from MiniMax.
The honest counterweight to the vendor numbers comes from independent testing. In an independent multi-model benchmark published June 1, 2026, MiniMax M3 scored 78 out of 100 — a solid Tier B — while Claude Opus 4.8 landed at 95 (Tier A) and GPT-5.5 and GPT-5.4 sat at 96 to 97. M3 is genuinely strong and genuinely cheap; it is not, on independent evidence, on par with Opus 4.8 for quality. Both things are true at once, and the routing implication is to match the model to the task rather than to the headline price.
With Opus at $1.10 delivering Tier A out of the gate, the cheaper models only win on paper.— Flávio Copes, independent June 2026 LLM benchmark
05 — Price LadderA full price ladder with capability at every rung.
Zoom out from the wave to OpenRouter's broader price ladder as it stood in early June. The bars below chart input price per million tokens across representative models, from sub-$0.10 reference rates up to the $5.00 frontier tier. Longer bars are more expensive. The point is not the ordering — it is that capable, long-context options now exist at nearly every rung.
Input price per 1M tokens · representative OpenRouter models
Source: OpenRouter model pages, retrieved June 4, 2026The takeaway is structural. A year ago, the cheap end of the ladder meant short context windows and modest capability; the long-context frontier meant premium pricing. The June ladder breaks that coupling. Step 3.7 Flash, MiniMax M3, Qwen3.7-Plus, and the Nemotron family all sit at or below $0.40 per million input tokens while carrying 256K-to-1M context windows. The decision a team faces is no longer "can we afford long context" but "which long-context model fits this specific workload at this price." Our AI transformation engagements start with exactly that comparative eval.
06 — Usage RankingsWhat the programming rankings do and don't measure.
OpenRouter publishes a programming collection ranked by usage. In early June, its top coding models by volume ran roughly: MiMo V2.5, MiniMax M3, DeepSeek V4 Flash, MiMo V2.5 Pro, Hy3 Preview, DeepSeek V4 Pro, Claude Opus 4.7, Owl Alpha, Step 3.7 Flash, and Claude Sonnet 4.6. The critical caveat: these are real-time usage rankings, not quality benchmarks. MiniMax M3 ranking near the top reflects adoption and price, not a verified quality endorsement.
Usage rankings also move fast, and history is instructive. Tencent's Hy3 preview held OpenRouter's number-one overall usage ranking from April 27 to May 11, accumulating about 7.7 trillion tokens in 19 days at $0.063 per million input. By early June it had slipped down the programming collection. Independent analysis read the steady usage after Hy3 moved to a paid tier as organic demand from a single large application rather than promotional uptake — a reminder that a ranking can reflect one heavy consumer as much as broad preference.
07 — The SplitDollars versus tokens — a permanent bifurcation.
The most important pattern in the data is not any single model. It is the split between the volume market and the dollar market. Chinese-origin models — from Xiaomi, Alibaba, MiniMax, DeepSeek, and Moonshot — now account for more than 45% of all OpenRouter traffic by token volume, up from below 2% a year ago. Anthropic, by contrast, holds around 12.3% of token share but retains a much higher dollar share through premium pricing. OpenRouter now routes both commodity inference and premium inference as distinct lanes.
Independent market analysis frames the dynamic bluntly: the market is following price, not quality. Sub-dollar blended pricing from open-weight challengers has captured disproportionate token volume, while premium closed models dominate dollar spend despite lower token share. Inference volume across the tracked market grew roughly 11.1x year over year. This is the interpretive core of the wave — not a temporary dislocation but a durable structure that should change how developers reason about model selection.
The dollar-vs-token split · OpenRouter traffic, mid-2026
Source: CodeSOTA market trends; TechTimes, May 2026Projecting forward, the bifurcation looks structural rather than cyclical. If sub-dollar long-context models keep arriving every few days — as the ten-day wave suggests they will — the volume lane will keep commoditizing while the dollar lane consolidates around the handful of frontier models worth a premium for the hardest work. The strategic move for most teams is to stop treating "which model" as a single decision and start treating it as a routing policy that differs by task class. That is exactly the multi-model thesis behind OpenRouter Fusion, the synthesis layer that makes multi-model routing practical.
08 — How To RouteA practical routing posture for the June wave.
The wave does not crown a single winner; it rewards deliberate routing. Below is a workload-by-workload posture grounded only in the evidence above — independent benchmarks where they exist, vendor claims marked as such, and price read live from OpenRouter.
Hard, high-stakes tasks
Independent benchmarks put Opus 4.8 (Tier A) and the GPT-5.5 / 5.4 line at the top of the wave. When a wrong answer is expensive, the premium tier earns its dollar share. Verify on your own prompts before committing defaults.
High-volume long-context work
MiniMax M3 at $0.30/M promo offers 1M context and strong coding for a fraction of frontier cost. Sound for high-volume, lower-stakes workloads — but it is Tier B on independent testing, so keep a quality fallback for hard cases.
Image, video, GUI tasks
Step 3.7 Flash and Qwen3.7-Plus both add native vision at sub-$0.40 input. Qwen3.7-Plus is closed; Step 3.7 Flash ships open weights. Choose on the open-vs-closed axis your deployment requires, not on headline price alone.
Open-weight autonomous agents
NVIDIA Nemotron 3 Ultra ships open weights with an explicitly agentic training focus and 1M context. OpenRouter pricing was unconfirmed at launch — bracket cost via Nemotron 3 Super and confirm before production.
For most agencies and engineering teams, the starting point is not a switch but a measurement: benchmark two or three candidates on your own prompts, measure token spend and latency, and decide per workload. The wave makes that discipline more valuable, not less, because the cheapest capable option changes month to month. Treat the price ladder as a menu and your routing policy as the thing you actually maintain. Sibling references for the individual models — Step 3.7 Flash, Qwen 3.7 Plus, and Nemotron 3 Super — carry the per-model detail this roundup keeps brief.
09 — ConclusionThe week the price floor moved.
Model selection is now a routing policy, not a single decision.
Five major models in ten days is not noise. Read together, the OpenRouter wave of May 27 to June 4 shows a market where frontier-aimed capability arrives every other day, where open-weight challengers ship alongside closed-frontier releases rather than behind them, and where every price tier now carries a credible million-token-context option.
The honest framing keeps two facts in view at once. MiniMax M3 is genuinely cheap and genuinely capable at $0.30 per million input tokens — and it is Tier B on independent testing, not a drop-in replacement for the Tier A frontier. The cheapest capable model and the highest-quality model are increasingly different models, and the gap between them is now a routing decision rather than a budget constraint.
The durable signal is the split itself: a commodity lane of high-volume open-weight inference and a premium lane of low-volume frontier work, both routed through the same platform. The teams that win the next year will not pick a model; they will maintain a policy — matching task class to the cheapest model that clears the quality bar, and re-checking that match as the price floor keeps moving.