Open-Source AI Landscape April 2026: Complete Guide
Complete open-source AI model landscape for April 2026. Gemma 4, Qwen 3.6 Plus, Llama 4, Mistral Small 4, gpt-oss, and GLM-5 ecosystem mapped and compared.
Major Labs Shipping Open Models
Largest Open Context (Scout)
Largest Open Model (GLM-5)
Most Permissive License Tier
Key Takeaways
The open-source AI landscape in April 2026 bears little resemblance to a year ago. Where 2025 was defined by Llama 3 dominance and a handful of Chinese alternatives, 2026 features six distinct model families from six different organizations, each with genuine competitive advantages in specific dimensions. Google, Alibaba, Meta, Mistral, OpenAI, and Zhipu AI are all shipping models that match or exceed proprietary alternatives on targeted benchmarks — and doing so under increasingly permissive licenses.
This guide maps the complete open-source AI ecosystem as of April 2026: model architectures, benchmark comparisons, licensing terms, deployment costs, and a selection framework for choosing the right model for your workload. For teams evaluating how open-weight models fit into their AI and digital transformation strategy, understanding this landscape is essential for making build-vs-buy decisions that hold up over the next 12 months.
The State of Open-Source AI in April 2026
Three structural shifts define the current open-source AI moment. First, mixture-of-experts (MoE) has become the default architecture, enabling models with hundreds of billions of total parameters to run on a single GPU by activating only a fraction per token. Second, licensing has genuinely liberalized — Apache 2.0 and MIT licenses now cover the majority of leading open models, eliminating the legal ambiguity that previously made enterprise adoption risky. Third, the gap between open and proprietary models has narrowed to the point where open-weight models are the correct default for many production workloads.
5 of 6 major open models use mixture-of-experts. Active parameter counts range from 5.1B (gpt-oss) to 40B (GLM-5), enabling single-GPU inference for models with 100B-744B total parameters.
Apache 2.0 and MIT licenses cover Gemma 4, Qwen 3.6, Mistral Small 4, gpt-oss-120b, and GLM-5. Only Llama 4 retains a custom community license with usage thresholds.
Chinese labs (Alibaba, Zhipu) now lead on specific benchmarks. GLM-5 was trained entirely on Huawei chips with zero NVIDIA dependency — a milestone for hardware independence.
The practical implication for businesses is clear: open-weight models are no longer a research curiosity or a cost-saving compromise. They are production-grade alternatives that offer control, privacy, and cost advantages that proprietary APIs cannot match. The question is no longer “should we consider open-weight?” but “which open-weight model matches our specific requirements?”
Complete Model Comparison Table
The following table compares all major open-weight models available in April 2026. For context window-specific analysis including pricing at full context, see our AI Context Window Comparison 2026 reference table.
| Model | Organization | Total Params | Active Params | Context | License | Architecture |
|---|---|---|---|---|---|---|
| GLM-5 | Zhipu AI | 744B | 40B | 200K | MIT | MoE |
| Llama 4 Maverick | Meta | 400B | 17B | 1M | Llama 4 Community | 128-expert MoE |
| Mistral Small 4 | Mistral AI | 119B | 6.5B | 256K | Apache 2.0 | 128-expert MoE |
| gpt-oss-120b | OpenAI | 117B | 5.1B | 128K | Apache 2.0 | MoE (MXFP4 quant) |
| Llama 4 Scout | Meta | 109B | 17B | 10M | Llama 4 Community | 16-expert MoE |
| Qwen 3.6 Plus | Alibaba | TBD* | TBD* | 1M | Apache 2.0 | Hybrid MoE + linear attention |
| Gemma 4 31B | 31B | 31B | 256K | Apache 2.0 | Dense | |
| Gemma 4 26B-A4B | 26B | 4B | 256K | Apache 2.0 | MoE | |
| Gemma 4 E4B | 4B | 4B | 128K | Apache 2.0 | Dense (edge) | |
| Gemma 4 E2B | 2B | 2B | 128K | Apache 2.0 | Dense (edge) |
* Qwen 3.6 Plus is in preview; full architecture details including exact parameter counts have not been officially published.
Gemma 4: Google's Apache 2.0 Breakthrough
Google DeepMind released Gemma 4 on April 2, 2026 — the first Gemma generation under an OSI-approved Apache 2.0 license. This is a strategic shift: previous Gemma releases used Google's custom terms, which created legal uncertainty for enterprise adopters. Apache 2.0 eliminates that friction entirely.
Gemma 4 Model Variants
Gemma 4 31B Dense
Largest variant. 256K context. 89.2% AIME 2026, 84.3% GPQA Diamond. Frontier-level reasoning.
Gemma 4 26B-A4B MoE
26B total, 4B active. 256K context. Near-31B quality at a fraction of the compute. Efficient server deployment.
Gemma 4 E4B (Edge)
4B dense. 128K context. Native audio input. Designed for on-device and mobile deployment.
Gemma 4 E2B (Edge)
2B dense. 128K context. Native audio input. The smallest variant for resource-constrained environments.
Gemma 4's differentiator is breadth: four variants covering edge (2B, 4B) to server (26B MoE, 31B Dense), all natively multimodal with vision support across all sizes and audio input on the edge models. The 31B Dense model matches or exceeds models with significantly more parameters — scoring 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6. This positions Gemma 4 as the best-in-class option for teams that need a single model family spanning phone to datacenter, all under the same permissive license.
One notable limitation: Gemma 4's largest model is 31B parameters — significantly smaller than Llama 4 Maverick (400B) or GLM-5 (744B). For tasks where raw parameter scale correlates with quality (complex multi-step reasoning, very large knowledge retrieval), Gemma 4 trades absolute peak performance for deployment efficiency. The benchmarks suggest this trade is favorable for most tasks, but teams with extreme reasoning requirements should evaluate carefully. For a detailed head-to-head, see our Gemma 4 vs Llama 4 vs Mistral Small 4 comparison.
NVIDIA optimization: NVIDIA has already released optimized inference configurations for Gemma 4 on RTX hardware, enabling local agentic AI workflows on consumer GPUs. This makes Gemma 4 E4B particularly attractive for developer workstations and edge AI prototyping.
Qwen 3.6 Plus: 1M Context and Hybrid Architecture
Alibaba's Qwen family has quietly become the most commercially deployed open-weight model family globally, driven by Apache 2.0 licensing, competitive benchmarks, and the widest range of model sizes available from any single lab. Qwen 3.6 Plus, released on OpenRouter on March 31, 2026, extends this lead with a 1M-token context window and a novel hybrid architecture.
Qwen 3.6 Plus combines linear attention mechanisms with sparse MoE. Linear attention reduces the computational cost of processing long sequences (the quadratic bottleneck in standard attention), while sparse MoE activates only the necessary experts per token. The result is a model that handles 1M tokens with significantly less compute than a standard dense-attention model at comparable scale.
The broader Qwen 3.5 family leads on coding benchmarks — LiveCodeBench and SWE-bench show clear margins over Llama 4 and Gemma 4. Qwen 3.6 Plus continues this strength with always-on chain-of-thought reasoning that reaches conclusions faster and uses fewer output tokens on straightforward tasks, reducing both latency and cost.
The free preview period is strategically aggressive — Alibaba is using zero-cost access to drive adoption and ecosystem lock-in before pricing kicks in. The broader Qwen family already delivers competitive performance at 10-17x lower cost than Claude or GPT for comparable tasks. Expect post-preview pricing to be highly competitive.
The Qwen ecosystem's strength is range. From Qwen 3.5 0.8B for edge devices to Qwen 3.5 397B-A17B for frontier tasks to Qwen 3.6 Plus for 1M-context workloads, the family covers more deployment scenarios under a single license (Apache 2.0) than any competitor. For teams building products that need to scale from mobile to cloud, this consistency reduces the engineering cost of multi-model strategies.
Llama 4 Scout and Maverick: Meta's MoE Gambit
Meta's Llama 4 release represents two strategic bets: Scout bets on extreme context length (10M tokens), while Maverick bets on extreme expert count (128 experts, 400B total parameters). Both use MoE with 17B active parameters per token, maintaining the inference efficiency that made Llama the default choice for self-hosted deployments.
16-expert MoE trained on ~40 trillion tokens. The 10M context was achieved through length generalization from 256K pre-training. Fits on a single H100 GPU. Natively multimodal with early fusion for text and image. Best for retrieval-oriented tasks across very large document sets.
128-expert MoE with the highest MMLU (85.5%) among open models. 1M context via Instruct fine-tuning from 256K pre-training. Requires 4-8 H100s for serving due to the 128-expert routing overhead. Best for general-purpose reasoning where absolute benchmark scores matter.
The Llama 4 Community License remains Meta's licensing approach: free for commercial use up to 700 million monthly active users. For the vast majority of businesses, this threshold is irrelevant — only a handful of companies globally operate above it. However, the non-standard license does create friction for enterprise legal teams that prefer OSI-approved licenses, and it prevents derivative model distribution without explicit permission. Teams with strict licensing requirements should consider the Apache 2.0 alternatives.
Mistral Small 4: The Unified 119B MoE
Mistral Small 4 takes a different strategic approach: instead of specializing models for different tasks, it unifies instruction following, reasoning, multimodal understanding, and agentic coding into a single 119B-parameter MoE model. Where other labs ship separate models for instruct, reasoning, and coding, Mistral ships one model with adjustable reasoning effort at inference time.
Mistral Small 4 Specifications
Architecture
119B total, 6.5B active. 128 experts, 4 active per token.
Context Window
256K tokens. Text and image input, text output.
Reasoning Control
Adjustable from “none” (fast instruct) to “high” (deliberate step-by-step reasoning).
Efficiency Gains
40% faster completion time. 3x more requests/second vs. Mistral Small 3.
The unification strategy is compelling for teams that want to minimize model management overhead. Instead of maintaining separate instruct, reasoning, and coding models with different APIs, configurations, and deployment pipelines, teams deploy Mistral Small 4 once and adjust reasoning_effort per request. With reasoning_effort="none" it produces fast responses comparable to Mistral Small 3.2; with reasoning_effort="high" it matches earlier Magistral reasoning models. For a deeper technical analysis, see our Mistral Small 4 complete guide.
On benchmarks, Mistral Small 4 with reasoning matches or surpasses gpt-oss-120b while generating significantly shorter outputs — on LiveCodeBench it outperforms gpt-oss while producing 20% less output. This output efficiency translates directly to lower cost in production, where output tokens are typically 2-3x more expensive than input tokens.
gpt-oss-120b: OpenAI Enters Open-Weight
OpenAI's first open-weight model is a strategic move that acknowledges the inevitability of open-weight competition. Released under Apache 2.0 with both a 120B and 20B variant, gpt-oss is explicitly positioned in the efficiency tier — not as a GPT-5.4 competitor, but as a production-ready model for general-purpose tasks that fits on a single GPU.
117B total parameters with only 5.1B active per token. Uses alternating dense and locally banded sparse attention patterns, grouped multi-query attention (group size 8), and RoPE positional encoding. MoE projection weights are quantized to MXFP4 (4.25 bits per parameter), enabling the full model to fit on a single 80GB GPU (H100 or AMD MI300X).
Trained using reinforcement learning informed by OpenAI's most advanced internal models including o3 and other frontier systems. This distillation approach means gpt-oss benefits from capabilities developed for much larger models without requiring the same training compute. The dataset is mostly English, text-only, with a focus on STEM, coding, and general knowledge.
The strategic significance of gpt-oss extends beyond the model itself. OpenAI releasing under Apache 2.0 signals that the competitive dynamics of AI have permanently shifted: even the most commercially oriented lab in the industry now considers open-weight models a necessary part of its strategy. For enterprises that have historically been OpenAI-only shops, gpt-oss provides a familiar ecosystem entry point into self-hosted deployment — using the same o200k_harmony tokenizer and architectural patterns familiar from the GPT family.
Key limitation: gpt-oss is text-only. It does not support image, audio, or video input. For multimodal open-weight needs, use Gemma 4, Llama 4, or Mistral Small 4. Also, its 128K context window is the smallest among the major open models released in 2026.
GLM-5: Trained Entirely on Huawei Silicon
Zhipu AI's GLM-5 is the largest open-weight model in the current landscape at 744B total parameters (40B active), and it carries a claim that extends beyond AI capability: it was trained entirely on Huawei Ascend chips using the MindSpore framework, with zero dependency on NVIDIA hardware. This makes GLM-5 the first frontier-class model to demonstrate that Chinese hardware can support production-scale AI training — a development with significant geopolitical and supply-chain implications.
GLM-5 Key Specifications
Parameters
744B total, 40B active per token. 2x scale-up from GLM-4.5.
Training Data
28.5 trillion tokens (up from 23T in GLM-4.5).
License
MIT license — the most permissive available. No restrictions on use, modification, or distribution.
Context / Attention
200K tokens. DeepSeek Sparse Attention (DSA) for efficient long-context handling.
Pricing (API)
$1.00/$3.20 per MTok input/output — ~10x cheaper than Claude Opus 4.6.
Benchmarks
50.4% on Humanity's Last Exam (exceeding Claude Opus 4.5). Industry-leading low hallucination rate.
GLM-5's 50.4% on Humanity's Last Exam is a notable benchmark, as is its claimed industry-lowest hallucination rate. The model uses a novel asynchronous reinforcement learning infrastructure called Slime, developed by Zhipu specifically for GLM-5 training, which improves RL throughput and enables more fine-grained post-training iterations. For a deeper analysis, see our GLM-5 release analysis.
The practical implication for Western businesses is twofold. First, GLM-5 under MIT license is a genuinely competitive frontier model with no usage restrictions. Second, the Huawei training demonstration suggests that future open models from Chinese labs will not be constrained by US export controls on NVIDIA hardware — the competitive pressure on Western open-source efforts will only intensify.
Licensing Deep Dive: Apache 2.0 vs. Community Licenses
License selection is a first-order decision for any team deploying open-weight models. The wrong license can block product launches, prevent model distribution, or create regulatory risk. The current landscape has consolidated into three tiers.
| License | Models | Commercial Use | Derivative Models | Restrictions |
|---|---|---|---|---|
| Apache 2.0 | Gemma 4, Qwen 3.6, Mistral Small 4, gpt-oss | Unrestricted | Permitted | None |
| MIT | GLM-5 | Unrestricted | Permitted | None |
| Llama 4 Community | Llama 4 Scout, Maverick | Free under 700M MAU | Requires attribution | 700M MAU threshold; separate license above |
For enterprise deployment, the practical recommendation is clear: Apache 2.0 and MIT models require no legal review beyond standard open-source compliance. Llama 4's community license requires legal review of the specific terms, particularly around derivative model distribution and the 700M MAU threshold. While the threshold is irrelevant for most businesses, the non-standard nature of the license creates overhead that Apache 2.0 alternatives eliminate entirely.
The licensing shift toward Apache 2.0 across five of six major model families is the most strategically significant development in this landscape. It transforms open-weight models from “available with conditions” to “available unconditionally” — which removes the last major barrier to enterprise adoption beyond technical capability.
Deployment Cost Analysis: Self-Hosted vs. API
The economic case for open-weight models depends on deployment volume. At low volumes, API services are cheaper (no infrastructure overhead). At high volumes, self-hosting is 3-10x cheaper per token. The crossover point varies by model and workload.
| Model | Minimum Hardware | Cloud Cost/Hour | Est. $/MTok (Self-Hosted) | API $/MTok Input |
|---|---|---|---|---|
| Llama 4 Scout | 1x H100 80GB | ~$2.50 | ~$0.20-0.40 | $0.15-0.35* |
| gpt-oss-120b | 1x H100 80GB | ~$2.50 | ~$0.20-0.40 | $0.20-0.40* |
| Mistral Small 4 | 1x H100 80GB | ~$2.50 | ~$0.15-0.30 | $0.10 |
| Gemma 4 31B | 1x A100 40GB | ~$1.50 | ~$0.08-0.15 | Varies* |
| Llama 4 Maverick | 4-8x H100 80GB | ~$10-20 | ~$0.40-0.80 | $0.35-0.60* |
* API pricing for open-weight models varies by provider (Together, Fireworks, Groq, AWS Bedrock, etc.). Figures reflect typical ranges across major providers.
The cost crossover for self-hosting typically occurs at approximately 50-100 million tokens per month. Below that threshold, API services are more cost-effective due to the fixed cost of GPU rental. Above it, self-hosting on reserved instances or dedicated hardware delivers significant savings. Teams processing over 1 billion tokens per month can expect 5-10x cost reductions from self-hosting, though this requires investment in deployment infrastructure, monitoring, auto-scaling, and model versioning pipelines.
For teams evaluating the full stack of AI infrastructure decisions — from model selection to deployment strategy to cost optimization — our AI and digital transformation consulting provides hands-on guidance for building production AI systems that balance capability, cost, and operational complexity.
Selection Framework: Choosing the Right Open Model
With six competitive model families, choosing the right open-weight model requires a structured decision process. The following framework maps common priorities to recommended models.
| Priority | Best Choice | Runner-Up | Rationale |
|---|---|---|---|
| Coding / development | Qwen 3.5/3.6 | Mistral Small 4 | Leads LiveCodeBench and SWE-bench |
| Maximum context window | Llama 4 Scout | Qwen 3.6 Plus | 10M tokens; runner-up at 1M |
| Edge / mobile deployment | Gemma 4 E2B/E4B | Qwen 3.5 0.8B | Native multimodal at 2-4B params |
| Strictest licensing | Gemma 4 / gpt-oss | GLM-5 (MIT) | Apache 2.0 / MIT; no conditions |
| Highest benchmark scores | Llama 4 Maverick | GLM-5 | 85.5% MMLU; runner-up 50.4% HLE |
| Unified model (no multi-model) | Mistral Small 4 | Qwen 3.6 Plus | Instruct + reasoning + vision + coding |
| Single-GPU deployment | gpt-oss-120b | Llama 4 Scout | 5.1B active; fits H100 with MXFP4 |
| Multimodal (vision + audio) | Gemma 4 | Llama 4 (vision only) | All 4 variants support vision; edge models add audio |
| Fine-tuning flexibility | Qwen 3.5 family | Gemma 4 | Widest size range (0.8B-397B) under Apache 2.0 |
No single model dominates all dimensions. The competitive landscape is genuinely differentiated, which is healthy for the ecosystem but requires more deliberate model selection than the Llama-default era of 2024-2025. Teams that invest in structured evaluation against their specific task distributions — rather than relying on generic benchmarks — will make significantly better deployment decisions.
Conclusion
The open-source AI landscape in April 2026 is the most competitive and accessible it has ever been. Six model families from six organizations offer genuine alternatives to proprietary APIs across virtually every use case. The MoE revolution has made frontier-class inference possible on single GPUs. Licensing has liberalized to the point where five of six families use Apache 2.0 or MIT. And the gap between open and proprietary has narrowed to where open-weight models are the rational default for many production workloads.
For businesses, the strategic implication is straightforward: the cost of not evaluating open-weight models is now higher than the cost of evaluating them. Teams locked into proprietary-only API strategies are paying 3-10x premiums at scale, accepting single vendor dependency, and forgoing the privacy and customization advantages of self-hosted deployment. The models covered in this guide — and the context window analysis in our companion AI Context Window Comparison 2026 — provide the reference material needed to start that evaluation with confidence.
The pace of open-weight releases shows no sign of slowing. Expect continued convergence with proprietary frontier models, broader multimodal coverage, and longer context windows across all families through 2026. Build abstraction, benchmark rigorously, and stay ready to swap.
Build With Open-Source AI
Navigating six model families, three license types, and dozens of deployment options requires systematic evaluation. Our team helps businesses select, deploy, and optimize open-weight models for production workloads.
Related Articles
Continue exploring with these related guides