Open-weight models in H1 2026 have stopped being a single story. Three families — DeepSeek, Qwen, Llama — diverged sharply through May 16. Qwen ran the most active release cadence with the Qwen 3.5 family in February and Qwen 3.6 in April. DeepSeek shipped a single architectural reset, V4 Preview, on April 24. Meta shipped no new open-weight Llama by May 16 — Scout and Maverick continued from April 2025, Behemoth remained in training, and Meta's frontier attention shifted to the closed Muse line with Muse Spark.
What changed isn't a uniform cadence. It's three vendor postures. DeepSeek closed out V3 in late 2025 (V3.2 final on December 1, 2025) and pivoted to a hybrid-attention reset with V4 Preview on April 24, 2026. Alibaba/Qwen shipped Qwen 3.5 on February 16, a small-models drop on March 2, and Qwen 3.6 on April 16 with 35B-A3B and 27B open-weight checkpoints in the same April 16 drop. Meta's last open-weight Llama release was Llama 4 Scout and Maverick on April 5, 2025, with Behemoth still unreleased as of May 16, 2026.
This retrospective compiles the verified H1-to-date 2026 release inventory, the benchmark closure picture against closed frontier, the four trend lines defining the half, and an explicit forecast for H2. Every release date below is anchored to a vendor changelog, Hugging Face card, or first-party announcement; the open-versus- closed gap is real and we don't paper over it.
- 01Three families, three very different postures.Qwen ran the most active cadence (Qwen 3.5 family Feb 16, small-model drops Feb 24 and Mar 2, Qwen 3.6 open-weight checkpoints on Apr 16). DeepSeek shipped one architectural reset — V4 Preview on April 24. Meta released no new open-weight Llama through May 16, 2026; Scout and Maverick continued from April 2025 and Behemoth remained in training.
- 02DeepSeek V4 was the architectural event of the half.V4-Pro at 1.6T total / 49B active uses roughly 27% of V3.2's single-token inference FLOPs and 10% of the KV cache at 1M context — a hybrid attention stack interleaving Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA). V4-Flash at 284B total / 13B active sits beside it for cost-sensitive deployments.
- 03Benchmark closure on code and reasoning; gaps persist elsewhere.On LiveCodeBench, Codeforces, and Putnam-style formal reasoning the strongest open-weight model is now within roughly 5% of the strongest closed-frontier model — and in several cases ahead. General-knowledge benchmarks (MMLU-Pro), graduate-level science (GPQA Diamond), and very-long-context retrieval (MRCR 1M) still trail by 5–18 percentage points.
- 04Meta's quiet half is itself a signal.The Llama 4 Community License has been in force since the April 2025 Scout/Maverick release. Through May 16, 2026 Meta shipped no new open-weight Llama, launched the closed Muse Spark line, and left Behemoth in training. The H2 open-weight roadmap from Meta is now a low-probability catch-up scenario rather than the central case.
- 05Routing, not single-model selection, is the production default.By May the buyer question shifted from 'is open-weight good enough' to 'which open-weight per workload, which hosting stack, which sovereign-deployment shape.' DeepSeek V4 leads on code-heavy and long-context workloads; Qwen 3.6 leads on cadence-driven and on-device workloads; Llama 4 (Apr 2025) remains the integration-bound default until Meta ships again.
01 — Why Open-Weight in H1The half open-weight became enterprise-grade.
The story of open-weight in H1 2026 is less about a uniform cadence and more about three vendors running three different plays. Qwen ran a high-cadence variant strategy. DeepSeek bet on a single architectural reset. Meta paused. Underneath those plays, two cross-cutting shifts compounded — benchmark closure on code and reasoning, and hosting-cost reductions on vLLM, SGLang, and Cerebras — that together flipped the total-cost-of-ownership math for code-automation, long-context retrieval, and sovereign- deployment workloads.
What enterprise buyers asked about in January was "can we use open-weight for anything serious?" What they asked about in May was "which open-weight family for which workload, and which hosting partner." That is the entire shift, summarized in one sentence. The rest of this retrospective is the data behind it.
A note on methodology. The release inventory counts first-party production releases with a Hugging Face model card or vendor- published changelog entry between January 1 and the second week of May 2026 (DeepSeek changelog, Qwen release history, Llama 4 Community License). Pre-H1-2026 releases that continued shipping in production (Llama 4 Scout/Maverick from April 2025, DeepSeek V3.2 final from December 1, 2025) are noted but not counted toward the H1 total. Benchmark closure compares the strongest mode of each family's flagship against the strongest publicly evaluated mode of closed-frontier models. We treat all numbers as directionally accurate rather than precise to the basis point.
02 — DeepSeek V4 ResetOne H1 release, one architectural reset.
DeepSeek shipped exactly one production release in H1 2026 — V4 Preview on April 24, with V4-Pro and V4-Flash both available simultaneously (DeepSeek V4 announcement). The H1 cadence is the opposite of Qwen's: a single, architecturally heavy release rather than a stream of variants. The V3 line closed out before H1 — V3.2-Exp shipped September 29, 2025 and V3.2 final on December 1, 2025 — and carried most production traffic through Q1 2026 while teams evaluated V4 Preview.
The signature contribution is efficiency at long context. Per the V4-Pro model card, the 1.6T-total / 49B-active model uses roughly 27% of V3.2's single-token inference FLOPs and 10% of its KV cache at 1M-token context — the result of a hybrid attention stack that interleaves Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA) across layers. V4-Flash sits beside it at 284B total / 13B active for cost-sensitive deployments. For the full V4 architecture and benchmark breakdown, see our DeepSeek V4 Preview launch analysis.
V3.2 Final
DeepSeek Sparse Attention v1 production line. Carried most production traffic through Q1 2026 while teams evaluated V4. Not counted toward the H1 release total — included for context.
V4 Pro
Hybrid attention (CSA + HCA) plus Manifold-Constrained Hyper-Connections and Muon optimizer. 27% of V3.2's single-token FLOPs and 10% of its KV cache at 1M context. The architectural reset.
V4 Flash
Cost-sensitive sibling to V4-Pro, shipped the same day. Same hybrid-attention stack at smaller scale — fits stricter latency and price-per-token envelopes while keeping the 1M-context window.
The cadence shape matters as much as the architecture. DeepSeek spent H1 in single-release mode: V3.2 final from December 2025 carried Q1 traffic, V4 Preview reset the architecture in late April, and no intermediate V3.x checkpoint shipped in the window. That conservation of release events is a strategic choice — V4 Preview is materially harder to host than V3.2, and the V3.2 production line existed precisely so teams could defer the migration to V4 until ready.
On the V4-specific numbers: V4-Pro-Max in Think Max mode hits 93.5 on LiveCodeBench (Pass@1), 3206 Codeforces rating, and a proof-perfect 120/120 on Putnam-2025. It trails Gemini-3.1-Pro on MMLU-Pro and GPQA Diamond, and trails Claude Opus 4.6 on MRCR 1M. DeepSeek's own framing — "3 to 6 months behind absolute frontier" — is the honest version and the one we recommend using when scoping eval work.
03 — Qwen 3.5 + 3.6 CadenceTwo minor versions, four distinct release events.
Alibaba's Qwen was the cadence story of H1 2026. Qwen 3.5 launched on February 16, 2026 with Qwen 3.5 (open-weights) and Qwen 3.5-Plus (proprietary), and additional size variants dropped on February 24 and a separate small-models family — 0.8B / 2B / 4B / 9B — landed on March 2. Qwen 3.6 then arrived on April 16 with the 35B-A3B MoE checkpoint, followed by Qwen 3.6-27B on April 22 — both Apache-2.0 licensed and shipped with SGLang and vLLM deployment instructions out of the box.
What distinguishes the Qwen 3.5 / 3.6 cadence isn't any single checkpoint — it's the breadth. Where DeepSeek concentrated on one architectural reset, Qwen shipped frontier-class MoE (3.6-35B-A3B), flagship dense models (3.6-27B), and a full small- models lineup for on-device deployment within the same window. That structure simplifies the procurement conversation: pick the variant that fits the workload, with the same family-level licence and tooling behind every choice.
Feb 16 · Feb 24 · Mar 2 · Apr 16
Qwen 3.5 launch (Feb 16), size-variant drops (Feb 24), small-models family 0.8B–9B (Mar 2), Qwen 3.6-35B-A3B (Apr 16). A 27B Qwen 3.6 follow-up shipped Apr 22. Four-plus first-party release events in three months.
MoE + dense + small
Qwen 3.6-35B-A3B (MoE, agentic-coding focus) and Qwen 3.6-27B (flagship-class dense coding) anchor the top of the family; the 3.5 small-models line (0.8B / 2B / 4B / 9B) covers edge and on-device deployment.
Procurement-friendly
Apache 2.0 across the open-weight Qwen 3.5 / 3.6 line means standard enterprise procurement language fits without negotiation; redistribution, fine-tuning, and on-prem deployment all permitted.
The Qwen 3.6-35B-A3B (MoE) release deserves a specific call-out. Positioned by the QwenLM team for "agentic coding power," it lands directly in the band where many enterprise code-automation teams have been running DeepSeek V3.2. Community evaluations through late April reported it as a cost-efficient alternative to DeepSeek V4-Pro for workloads where V4-Pro is over-provisioned. Benchmark on your own repositories before defaulting either way — the right family per workload is increasingly task-class-specific.
On the small-models side, the 0.8B–9B Qwen 3.5 line opened a new on-device deployment lane for open-weight in H1. Both Instruct and Base versions are available on Hugging Face and ModelScope, with the 0.8B and 2B variants targeting edge and IoT and the 4B and 9B variants targeting lightweight agent and reasoning workloads. The on-device deployment story is one of the more meaningful open-vs- closed differentiators that emerged in H1.
"Qwen turned open-weight from a one-flagship story into a per-workload variant tree under Apache 2.0 — that's the procurement-friendly form."— Our reading of enterprise adoption patterns, Q2 2026
04 — Meta's Quiet HalfNo new Llama, an organisational reset instead.
Meta shipped no new open-weight Llama family through May 16, 2026. Llama 4 Scout and Maverick were released on April 5, 2025 — "the first open-weight natively multimodal models with unprecedented context length support" — and continued to be the production Llama options through the H1-to-date window. Llama 4 Behemoth was announced as still in training in that same April 2025 post and, as of May 16, 2026, has still not been released. The Llama 4 Community License has been in force since the April 2025 release (license text).
What did change in H1 2026 was Meta's frontier posture: the April 8 Muse Spark launch from Meta Superintelligence Labs signaled a closed-model scaling lane while open-weight Llama paused. The frontier-Llama roadmap for the rest of 2026 is genuinely uncertain from outside the company, and a near-term Llama 5 release is a low-probability scenario rather than the central case. Enterprises that standardized on Llama 4 Scout or Maverick during 2025 continued to run them in production — the licence, model card, and hyperscaler integrations were already in place — but the cadence-driven competitive pressure that defined Qwen and the architectural-reset framing that defined DeepSeek were both absent from Meta through May 16.
Scout + Maverick
Llama 4 Scout and Maverick remain the production Llama options. No new open-weight Llama family shipped between January 1 and the second week of May 2026 — the H1 release count for Meta is zero.
Still in flight
Meta described Llama 4 Behemoth as in training in the original April 2025 announcement and has not since shipped weights. The largest size class remained unreleased as of May 16, 2026.
AI group reorganised
Meta's April 2026 Muse Spark launch shifted the frontier story toward a closed Meta Superintelligence Labs line. The open-weight Llama roadmap for the rest of the year is uncertain from outside the company; treat a near-term Llama 5 as low probability rather than central case.
Pick DeepSeek V4 or Qwen 3.6
Strongest open-weight signal on LiveCodeBench and Codeforces sits with DeepSeek V4-Pro. Qwen 3.6-35B-A3B (MoE) is the cost-efficient alternative for agentic-coding workloads. Llama 4 trails on competitive-programming benchmarks — viable but not the lead choice.
Pick DeepSeek V4-Pro for on-prem
V4's hybrid attention (CSA + HCA) plus 1M context makes it the strongest open-weight candidate for on-prem long-document RAG. Llama 4 Scout's 10M context is the easier procurement story for buyers already standardized on Llama; pick on workload weight versus procurement weight.
Pick Llama 4 (existing) or Qwen 3.6 (Apache)
Llama 4 has hyperscaler-managed integrations in place from 2025 — smoothest procurement path for buyers already standardized on Llama. Qwen 3.6 under Apache 2.0 is the cleaner net-new pick. Both work; the choice depends on existing procurement language.
Pick Qwen 3.5 small models
The Qwen 3.5 small-models family (0.8B / 2B / 4B / 9B, released Mar 2) is the strongest open-weight on-device option to emerge in H1. Llama 4 doesn't have a comparable small-model line; DeepSeek V4 starts at 13B active and isn't tuned for edge.
The takeaway for buyers is that open-weight in H1 2026 is no longer a single-model decision. It's a per-workload routing decision where the three families occupy distinctly different positions — DeepSeek leading on architecturally heavy long-context and code workloads, Qwen leading on cadence-driven variant fit and on-device deployments, and Llama 4 holding ground for buyers already standardized on the April-2025 integrations while Meta's next move remains unknown.
05 — Benchmark ClosureWithin 5% of closed frontier on code and reasoning.
The chart below summarizes where the strongest open-weight mode in each category lands against the strongest closed-frontier mode on the same benchmark. Bars are normalized to the leading model's score; the value column shows the open-weight absolute score. Blue accent throughout — this is the closure picture, not a head-to-head winner table.
Open-weight vs closed frontier · benchmark closure picture
Source: Aggregated from vendor reports + community evaluations, May 2026The pattern in that chart is the H1 2026 story compressed into one view. On code generation, competitive programming, and formal reasoning, the strongest open-weight model is now at or ahead of the strongest closed-frontier model. On general knowledge, graduate-level science, and very-long-context retrieval, the closed-frontier lead persists at roughly 5–18 percentage points.
The implication for production buyers is that benchmark closure isn't uniform — it's task-class-specific. For workloads in the code and formal-reasoning band, open-weight is now a credible default. For workloads in the general-knowledge and hardest-retrieval band, closed frontier still leads by enough margin that switching purely on cost is premature. The right architectural pattern by mid-H1 is a routing layer that picks per task class.
One methodological caveat is worth stating explicitly. Vendor self-reports tend to flatter; community evaluations tend to normalize. Where vendor numbers and community numbers diverge, we've preferred the lower of the two and noted the difference. The 5% "closure" framing is a directional claim, not a precise margin — your corpus may move the picture in either direction.
"Closure isn't uniform — it's task-class-specific. Code and reasoning have closed. General knowledge and hardest retrieval have not."— Digital Applied, H1 2026 retrospective
06 — Four TrendsWhat changed underneath the headlines.
Below the release inventory and the benchmark chart sit four trend lines that define how open-weight actually moved in the half. None of them is a single-model story; each is a structural shift in the way open-weight gets produced, hosted, and bought.
Cadence divergence
The three families ran three different release strategies: Qwen at four-plus first-party events in three months, DeepSeek at one architectural reset, Meta at zero new open-weight releases. Cadence is no longer a uniform open-weight signal.
Hosting cost collapse
Inference cost on the leading open-weight stacks reportedly dropped by roughly an order of magnitude versus H2 2025. vLLM and SGLang absorbed most production volume; Cerebras pushed latency-sensitive workloads into a new price band.
Sovereign-cloud standardization
Sovereign-cloud deployment patterns consolidated around three shapes: on-prem with vLLM, sovereign-hyperscaler-managed Llama 4 (existing integrations from 2025), or air-gapped clusters with quantized DeepSeek V4 or Qwen 3.6. The pattern, not the model, unlocks procurement.
The fourth trend doesn't fit neatly in a three-card grid because it's less a technical shift and more a buyer-side one — enterprise adoption velocity. Through H1 we saw open-weight move from a small number of early-mover engineering teams into a much larger set of procurement-bound enterprises across finance, healthcare, public sector, and mid-market manufacturing. The mechanism is the combination of the prior three trends: cadence gives buyers confidence the family will keep up; hosting cost collapse changes the TCO conversation; sovereign-cloud patterns unblock procurement.
Adoption is still concentrated. Code-automation pipelines, long-context document agents, and specific internal-knowledge-retrieval workloads dominate the production deployments we've observed. Wholesale replacement of closed frontier with open-weight for general-purpose agents remains rare; the routing pattern is more common, and we think it stays the dominant pattern through H2.
07 — H2 ProjectionWhere open-weight likely goes next.
Forecasts in a market moving this fast should be hedged. The shape of H1 was hard to predict in late 2025; the shape of H2 is harder still. Treat the projections below as base-case directional calls, not point forecasts.
Base-case calls for H2 2026
- DeepSeek V4 GA — the Preview becomes a full V4 GA in Q3, with at least one efficiency-focused revision before year-end. V4-Flash is already in the family; expect further optimisation for the long-context-RAG workload class.
- Qwen 3.6 → forecast-only Qwen 4 transition — the variant tree continues to expand rather than consolidate. Expect an embedding refresh, additional small-model sizes, and a Qwen 4 base as a forecast-only late-Q3 or Q4 scenario, not a shipped model.
- Meta's next Llama move (uncertain) — given the H1-to-date silence and Muse pivot, the Meta H2 open-weight roadmap is genuinely uncertain. A catch-up open-weight release in late Q3 / early Q4 would re-open the routing question, but the safer planning assumption is continued pause. Do not bet capacity on Behemoth GA or any forecast-only Llama 5 scenario.
- General-knowledge gap narrows — open-weight MMLU-Pro and GPQA Diamond gaps versus closed frontier reduce from the current 5–10 percentage points to roughly 3–6. Closed frontier still leads on the hardest retrieval workloads.
- Hosting cost stabilizes — the order-of-magnitude cost collapse of H1 won't repeat; expect another roughly 2–3× reduction on the leading stacks before plateauing as hardware utilization saturates.
- Routing becomes the production default — the clean "pick one model" architecture loses ground to routing layers that pick per task class. Expect at least one major open-source routing framework to emerge as a category standard.
Two harder-to-call shifts could change the picture meaningfully. First, a new entrant — a fourth family at the DeepSeek / Qwen / Llama tier — would compress competitive timelines further; we assign it moderate probability over six months. Second, a major closed-frontier vendor opening weights for a flagship would re-shape the open-versus-closed framing entirely; we assign it low probability but high impact if it happens.
For teams planning H2 capacity, the practical move is to design the deployment for routing rather than for any single family — and to invest in a benchmark harness that runs against your own corpus, not just published evaluations. That's where our AI digital transformation engagements typically start: per-workload eval, hosting-stack picks, and routing-layer design calibrated to actual workload volumes rather than vendor-pitched headline numbers. For the self-hosting cost side specifically, our companion analysis on self-hosting frontier-model TCO covers the hardware, utilization, and break-even math.
08 — ConclusionThe half open-weight became enterprise-grade.
By May 2026, open-weight had become enterprise-grade.
Three families, three very different postures. Qwen ran the most active cadence — Qwen 3.5 family in February, small models in March, Qwen 3.6 in April. DeepSeek shipped one architectural reset (V4 Preview, April 24). Meta shipped no new open-weight Llama and pivoted frontier attention toward the closed Muse Spark line. Underneath those vendor stories, benchmark closure on code and reasoning, an order-of-magnitude hosting-cost reduction on vLLM/SGLang/Cerebras, and three procurement-ready sovereign-cloud patterns compounded into something an enterprise procurement team can buy.
The honest framing on the gap is the right one. Open-weight has closed with closed frontier on code, competitive programming, and formal reasoning. It hasn't closed on general knowledge, graduate-level science, or the hardest long-context retrieval — those still trail by roughly 3 to 6 months. That gap is real, and the right response is per-workload routing rather than wholesale replacement.
The broader signal for H2 is that the question changes again. H1's question was "is open-weight good enough." H2's question is "which routing pattern, which hosting stack, which sovereign-deployment shape." That's a more specific, more buyable, more enterprise-grade conversation — and it's what makes the H1-to-date window matter.