Frontier Model Release Velocity Index 2026 Q2 Report
The Frontier Model Release Velocity Index tracks new-model launch rates per provider — OpenAI, Anthropic, Google, Alibaba, Zhipu. Q2 2026 trajectory data.
Frontier Releases Q1 2026
Average Per Week
Procurement Cycle
Fastest Shipping Entrant
Key Takeaways
The frontier model release rate doubled in Q1 2026 versus Q4 2025. If it holds through Q2, agencies are procuring on a 4-week cycle instead of 6-month. The Frontier Model Release Velocity Index (FMRVI) is Digital Applied's original framework for measuring what that actually means for planning, budgeting, and client delivery.
Between January 23 and April 2, 2026, at least twelve labs shipped substantive frontier models: Alibaba alone released seven Qwen variants, Xiaomi shipped three MiMo V2 models, MiniMax put out two M-series updates, Anthropic released Claude Sonnet 4.6, and NVIDIA pushed Nemotron 3 Super 120B to open weights. The practical result is that the top-ranked model on OpenRouter changed twice inside a single quarter, and the fastest-growing model by usage (MiMo V2 Pro) did not exist before mid-March.
FMRVI in one line: an index that counts substantive public frontier releases per week per lab and tracks how that compresses the procurement cycle agencies can reasonably run.
What FMRVI Measures
FMRVI is a rolling index. It counts substantive frontier releases per week across a tracked roster of labs and reports a weekly average and quarterly total. The point is not to pick a winner. The point is to tell agencies how often the ground is moving so they can size their evaluation budget and cycle accordingly.
Substantive Release Rules
A model counts as a substantive release if it clears at least one of four thresholds:
- Benchmark leadership: takes #1 or ties #1 on a widely used benchmark (SWE-bench family, MMMLU, Humanity's Last Exam, LMSYS Arena) for its capability class.
- Pricing-tier shift: establishes a new cost floor for a given capability band, for example Qwen 3.5 Flash landing at $0.065 / $0.26 per 1M for 1M-context use.
- Modality expansion: adds a meaningful new modality or context window tier (omnimodal, 1M context, vision-only variant).
- Production safeguard change: ships a new alignment, cyber, or agentic-autonomy capability that changes what the model can be deployed for at enterprise scale.
Incremental version bumps, UI changes, and product-layer launches (Claude for Excel, Computer Use on macOS, Channels) do not count as model releases even when they ship the same week. We track them separately on a product-velocity ledger because they affect client onboarding costs rather than procurement math.
We built FMRVI for our own client planning: agencies, marketplaces, and platform teams that run production AI workloads need a shared vocabulary for the release cadence without getting pulled into benchmark-of-the-week hype. The index is public so others can cite it, replicate the rules, and push back on our counts. We recount and re-publish every quarter.
Provider Cadence: Top 5 Fastest Shippers Q1 2026
The top five labs by Q1 2026 release count, ranked:
| Rank | Provider | Q1 Releases | Notable Models | Cadence Signal |
|---|---|---|---|---|
| 1 | Alibaba | 7 | Qwen 3 Max Thinking (Jan 23), Qwen 3 Coder Next (Feb 3), Qwen 3.5 Flash (Feb 24), Qwen 3.5 small series (Mar 2-3), Qwen 3.5-Omni (Mar 30), Qwen 3.6 Plus (Apr 2) | One substantive release every ~10 days |
| 2 | Xiaomi | 3 | MiMo V2 Flash (Dec 2025), MiMo V2 Pro (Mar 18), MiMo V2 Omni (Mar 18) | Zero to 21.1% OpenRouter share in four months |
| 3 | MiniMax | 2 | MiniMax M2.5 (Feb 12), MiniMax M2.7 (Mar 18) | Sub-quarter major-version pace on M series |
| 4 | Anthropic | 1 | Claude Sonnet 4.6 (Feb 17) | One release, but near-Opus quality at 1/5 price |
| 5 | NVIDIA | 1 | Nemotron 3 Super 120B (Mar 10-11) | Open-weight 120B/12B-active, 60.47% SWE-Bench Verified |
Two patterns stand out. First, Chinese labs dominate the cadence column: Alibaba, Xiaomi, and MiniMax together account for 12 of the top-5 table's 14 Q1 releases. Second, Anthropic and OpenAI appear lean on this axis but compensate with product-layer velocity. FMRVI is a model-cadence index, so those do not count here.
Need help mapping release cadence to client workloads? Cadence plus capability plus cost is the right procurement triangle. Explore our AI Digital Transformation service to build a tiered model portfolio instead of a single vendor bet.
Q1 2026 Release Timeline
A month-by-month view of the substantive frontier releases captured by FMRVI. All dates are public launch dates, not announce dates.
| Date | Provider | Model | Why It Qualifies |
|---|---|---|---|
| Oct 23, 2025 | MiniMax | MiniMax M2 (baseline) | Anchor release for the M-series cadence measurement |
| Dec 2025 | DeepSeek | DeepSeek V3.2 | 685B, IMO gold medal reasoning result |
| Dec 2025 | Xiaomi | MiMo V2 Flash | $0.09 / $0.29 per 1M, 262K context, open-source claim |
| Jan 23, 2026 | Alibaba | Qwen 3 Max Thinking | Reasoning variant, $0.78 / $3.90 per 1M, 262K context |
| Feb 2, 2026 | StepFun | Step 3.5 Flash | 196B MoE (11B active), free tier, high tool-call share |
| Feb 3, 2026 | Alibaba | Qwen 3 Coder Next | Coding-specific, $0.12 / $0.75 per 1M, 256K context |
| Feb 12, 2026 | MiniMax | MiniMax M2.5 | 80.2% SWE-Bench, near-frontier coding at open pricing |
| Feb 17, 2026 | Anthropic | Claude Sonnet 4.6 | Near-Opus quality at $3 / $15 per 1M |
| Feb 24, 2026 | Alibaba | Qwen 3.5 Flash | $0.065 / $0.26 per 1M at 1M context, pricing floor |
| Mar 2-3, 2026 | Alibaba | Qwen 3.5 small series | 0.8B-9B variants, on-device deployment class |
| Mar 10-11, 2026 | NVIDIA | Nemotron 3 Super 120B | 120B/12B active, 60.47% SWE-Bench Verified, open source |
| Mar 18, 2026 | Xiaomi | MiMo V2 Pro | 1T+ params, 42B active, $1 / $3 per 1M, #1 OpenRouter |
| Mar 18, 2026 | Xiaomi | MiMo V2 Omni | Omni-modal (image/video/audio), 262K unified context |
| Mar 18, 2026 | MiniMax | MiniMax M2.7 | Self-evolving, 10B active, 56.22% SWE-Pro, ~50x cheaper |
| Mar 30, 2026 | Alibaba | Qwen 3.5-Omni | Native omnimodal, 256K context, 113 languages |
| Apr 2, 2026 | Alibaba | Qwen 3.6 Plus | 1M context, 65K output, always-on CoT, function calling |
The density of mid-to-late March is the single most important pattern here. Five labs shipped substantive releases in a 13-day window (March 10 through March 23). That is the point where "frontier release" stopped being a monthly headline and became a weekly one.
For a Chinese-provider-specific cut of this data, see our Chinese AI models market share report for Q2 2026.
Release-Cycle Compression
From 2023 into mid-2025, frontier model releases followed a roughly 6-month cadence per lab. OpenAI shipped GPT-4 in March 2023 and the next major capability release (GPT-4 Turbo) in November. Anthropic followed a similar pattern. Agencies could plan model evaluations on a half-year horizon with confidence that the primary vendor choice would hold.
That pattern broke during H2 2025 and collapsed outright in Q1 2026. What drove the compression:
Alibaba, Xiaomi, MiniMax, DeepSeek, and Z.ai are all running roughly monthly release cadences on their flagship lines. The competitive floor is now 4 to 6 weeks, not 6 months.
MiniMax M2.7 at roughly 50x cheaper than Opus for comparable agentic coding tasks forced every lab to refresh their pricing tier. A price cut on a capability band effectively obsoletes older releases, so new releases follow faster.
OpenAI's GPT-5.4 launch on March 5, 2026 with native computer use and 1M context triggered response releases across Anthropic, Google, and the Chinese labs within three weeks.
Xiaomi's December-to-March push from new entrant to #1 OpenRouter model demonstrated that hardware-integrated distribution can bypass traditional onboarding friction, creating new pressure to match their cadence.
For agencies, the practical trajectory looks like this: 6-month cycle in 2024, 3-month cycle through 2025, 4-week cycle as of Q2 2026. There is no floor yet. Daily release cadence is not coming (training cycles remain the binding constraint), but sub-4-week cadence is entirely plausible by Q4 2026.
Procurement Implications
A 4-week cycle changes the economics of agency AI procurement in concrete ways. Three shifts matter most.
1. Evaluation Cost Becomes a Structural Line Item
At 6-month cadence, model evaluation was an occasional project. At 4-week cadence, it becomes a continuous operation: you need a standing eval pipeline running against every substantive release, or you forfeit the ability to respond to pricing and capability shifts in time to matter. Budget 3 to 5 percent of total AI spend for eval infrastructure and the engineering time to run it.
2. Vendor Contracts Need Swap-In Clauses
Long-term vendor lock on a specific model is now a liability. When negotiating with cloud providers, API aggregators, or enterprise model vendors, insist on swap-in clauses that let you rotate to materially better options within 30 to 60 days without contract penalty. OpenRouter-style aggregation helps but is not a complete substitute for negotiated flexibility.
3. Spend Reserves Beat Long Forecasts
Hold 10 to 15 percent of annual AI spend as an uncommitted reserve. When a release like MiMo V2 Pro or MiniMax M2.7 resets the efficient frontier mid-quarter, the agencies that can move budget in two weeks outperform those on rigid annual plans.
For deeper data on which models currently lead the price-capability tradeoff, our efficient-frontier analysis for Q2 2026 maps the current Pareto frontier across coding, reasoning, and agentic benchmarks. For a pricing-only view, see our LLM API pricing index for Q2 2026.
Need the analytics layer to support this? Tracking eval cost, token spend, and per-workload quality drift at 4-week cadence is a data problem before it is a procurement problem. Explore our Analytics & Insights service to build the dashboards and alerting that keep portfolio decisions grounded.
FMRVI Projections for Q2
Projecting the rest of Q2 2026 is a matter of combining known backlog with observed cadence. Our base case, our upside case, and our downside case:
| Scenario | Q2 Release Count | Weekly Average | Key Drivers |
|---|---|---|---|
| Downside | 10-12 | ~1 per week | DeepSeek V4 slips, Chinese cadence softens after Apr push |
| Base case | 14-18 | ~1.3 per week | DeepSeek V4 ships, Anthropic Mythos release, Gemini 3.5 line |
| Upside | 20+ | ~1.7+ per week | All Chinese labs ship major, Meta re-enters frontier, new omnimodal wave |
Seasonality Signals
Two seasonal patterns show up in the historical FMRVI data. First, early-quarter releases cluster around major conferences (Google I/O in May, OpenAI DevDay timing). Second, release density drops in the final two weeks of each quarter as labs hold back to influence quarterly earnings commentary. For Q2 specifically, expect the hottest weeks to fall between mid-May and mid-June.
Named Candidates on the Backlog
- DeepSeek V4: expected April, ~1T params, 1M context, reportedly Huawei-trained. The single most consequential pending release for open-weight benchmarking.
- Anthropic Mythos-class release: Project Glasswing framework implies a Mythos-tier production model (plausibly Claude Opus 4.7) lands in Q2 with cyber safeguards shipped.
- Gemini 3.5 line: Google's counter to GPT-5.4 native computer use is overdue; a Flash/Pro refresh pair is likely before Google I/O.
- Qwen 3.6 full line: Plus shipped April 2, with Max and Coder variants normally following within 4 weeks.
How to Build an Eval Pipeline That Keeps Up
At 4-week cadence, ad-hoc evaluation fails: by the time a senior engineer finishes benchmarking one release, two more have shipped. The eval pipeline we run for Digital Applied clients follows a specific shape.
1. A Canonical Task Set
Maintain a versioned corpus of 30 to 50 representative client tasks covering the workloads your agency actually bills for: content briefs, SEO tasks, code reviews, agentic customer workflows, data extraction, dashboard generation. Each task has a rubric, a reference answer, and a token budget. This is the single largest investment in the pipeline and pays back every time a new model ships.
2. Automated Trigger on Release Events
Subscribe to model-announcement feeds from the top ten labs, OpenRouter's new-model webhook, and Hugging Face trending releases. Any substantive release triggers a run of the canonical task set within 24 hours. No human in the loop for the initial score; humans only review the delta.
3. Three-Dimensional Scoring
Rate each model on quality, cost, and latency per task, then aggregate into a per-workload Pareto chart. A model that beats the current default on quality at higher cost may still lose for bulk workloads and win for high-stakes work. One score is never enough.
4. Production Shadow Mode
When a candidate clears the canonical evals, shadow-deploy it against 1 to 5 percent of live traffic for 48 hours before making any routing decisions. Production behavior often diverges from benchmark behavior in ways that only real traffic exposes.
The eval output needs to feed into the same systems that quote client work. If your agency bills per AI-assisted deliverable, the per-workload cost number from the pipeline should update project pricing automatically. Connecting the eval pipeline to our CRM automation tooling closes that loop.
FMRVI vs Spend and Volume
Does release velocity correlate with adoption? The short answer is not in the way you might expect. FMRVI rank, intelligence rank, and usage rank diverge sharply.
Taking April 2026 OpenRouter data at face value: MiMo V2 Pro is #1 in weekly tokens (4.79T) and #1 in coding tokens (2.05T) but #10 in the Artificial Analysis Intelligence Index. GPT-5.4 is #1 in intelligence but far from #1 in usage. Claude Opus 4.6 is #6-#7 in usage but a clear leader in production agentic coding. The three rankings measure different things, and an agency optimizing on the wrong one picks the wrong model.
What FMRVI Correlates With
- OpenRouter share gains: high-cadence labs (Xiaomi, Alibaba) are gaining share fastest. Correlation is strong because rapid iteration lets them match price and capability moves quickly.
- Enterprise adoption: high-cadence labs correlate negatively with enterprise share in Q1 2026. Enterprise buyers want stability, and a model that ships a successor every 6 weeks looks risky.
- Coding workload share: modest positive correlation. Coding-tool vendors (Cursor, Windsurf, Claude Code) move fast and benefit from new options.
- Intelligence-benchmark rank: no meaningful correlation. Many high-cadence labs skip expensive benchmarking, and many high-intelligence models release slowly.
For a granular view of the current OpenRouter rankings driving the adoption side of this picture, see our OpenRouter rankings for April 2026. For a deeper look at how MiniMax M2.7 reshaped the agentic coding tier in particular, see our MiniMax M2.7 agentic coding release guide.
What FMRVI Does Not Capture
Every index has blind spots, and FMRVI is no exception. Four limitations to understand before citing the number.
Index caveat: FMRVI is a cadence index, not a capability index. A quarter with 20 releases is not inherently better than a quarter with 10 releases if the 10 were more transformative.
1. Closed-Source Partner Pilots
Labs routinely ship pre-GA partner builds to selected enterprise customers weeks before public release. Those pilots influence the market (Cursor Composer 2 was built on Kimi K2.5 before K2.5 was widely available, for example) but do not count in FMRVI until they hit public availability.
2. In-Product Model Swaps
ChatGPT, Gemini, Copilot, and Claude consumer products frequently route requests to different models behind the scenes. A silent backend swap that moves all ChatGPT traffic to GPT-5.4 is commercially enormous but is not a release event.
3. Soft or Incremental Updates
Gemini 3.1 Flash-Lite (Mar 3) or Grok 4.20 beta (Feb 17) are real releases but borderline on our substantive threshold. We include them only when they meet one of the four rules, conservatively.
4. Region-Locked Launches
Some releases are initially available in one region (China-first, EU-first) before rolling out globally. FMRVI uses the first-region release date, but that can understate the effective cadence for any particular market.
Conclusion
The Frontier Model Release Velocity Index makes one uncomfortable fact concrete: the agencies that were planning AI procurement on a 6-month horizon in 2025 are now shipping into a 4-week market. Our Q1 2026 count of 12+ substantive frontier releases is double Q4 2025, and our base case for Q2 is 14 to 18 more.
The response is not to chase every release. It is to build the evaluation, procurement, and portfolio infrastructure that can absorb new releases without blowing up client delivery plans. A canonical eval corpus, automated release triggers, three-dimensional scoring, and shadow deployment are the minimum. For related analysis on the market structure driving these releases, our open-weight vs closed-source AI models analysis for Q2 2026 covers the strategic split that makes this cadence sustainable.
Build a Procurement Stack That Keeps Up
Agencies that treat AI model selection as a quarterly decision are losing ground. We help clients build the eval pipelines, vendor contracts, and portfolio management needed to ship at frontier-model cadence.
Frequently Asked Questions
Related Guides
Continue exploring frontier AI model market data