Chinese AI Models Q2 2026: 10-Provider Landscape Report
Q2 2026 market share report on Chinese AI providers — Qwen, GLM, DeepSeek, Kimi, MiniMax, Baichuan, and Yi. Usage data, licensing, and enterprise adoption.
Providers covered
Chinese share of OpenRouter
Xiaomi vs OpenAI ratio
Xiaomi share of tokens
Key Takeaways
Chinese AI providers now serve over 45% of all OpenRouter traffic, up from less than 2% a year ago. Xiaomi alone has 3x OpenAI's share. This is the Q2 2026 landscape report: who the ten meaningful providers are, what they ship, how they price it, and how to evaluate them for production workloads.
The shift is not a benchmark story. Chinese models do not yet lead the Artificial Analysis Intelligence Index — MiMo-V2-Pro ranks #10 despite being the #1 model by usage. The shift is a cost, availability, and developer-choice story. Free-preview access, 1M token context windows, and per-token prices three to ten times below US frontier models have moved the default backend for AI coding IDEs, agent platforms, and cost-sensitive production workloads east.
Data source: OpenRouter Rankings retrieved April 3, 2026. All market-share figures reflect weekly token volume across the OpenRouter API, which is the most public and auditable usage dataset for hosted AI models. For the full ranking breakdown see our April 2026 OpenRouter rankings analysis.
Q2 2026 Landscape at a Glance
Ten providers cover essentially all meaningful Chinese AI output in Q2 2026. The group has consolidated sharply from the 2024 frontier when dozens of labs published competing checkpoints. Today the volume flows through Xiaomi, Alibaba, Z.ai (Zhipu), DeepSeek, Moonshot AI, MiniMax, StepFun, ByteDance, Baidu, and Tencent, with Baichuan, Yi, Xunfei, and KwaiKAT operating as second-tier niche players.
| Rank | Provider | Weekly tokens | Share | Flagship model |
|---|---|---|---|---|
| 1 | Xiaomi | 4.21T | 21.1% | MiMo-V2-Pro |
| 2 | Alibaba (Qwen) | 2.77T | 13.9% | Qwen 3.6 Plus |
| 3 | MiniMax | 1.62T | 8.1% | MiniMax M2.7 |
| 4 | Z.ai (Zhipu) | 1.12T | 5.6% | GLM-5 / GLM-5 Turbo |
| 5 | DeepSeek | 1.11T | 5.6% | DeepSeek V3.2 |
| 6 | StepFun | 1.07T | 5.3% | Step 3.5 Flash |
| 7 | Moonshot AI | Sub-rank | Tracked | Kimi K2.5 |
| 8 | ByteDance | Sub-rank | Tracked | Seed 2.0 (Doubao) |
| 9 | Baidu | Domestic | Domestic | ERNIE 5.0 |
| 10 | Tencent | Domestic | Domestic | Hunyuan (internal) |
Share figures reflect OpenRouter weekly token volume at the provider level. Baidu, Tencent, and Moonshot concentrate usage on domestic Chinese surfaces and partner ecosystems rather than OpenRouter, so their global ranking understates their home-market presence. For Western buyers evaluating these providers, the OpenRouter data is still the most defensible apples-to-apples benchmark.
Mapping models to your stack? Model selection is rarely a single-benchmark decision. Explore our AI Digital Transformation service to translate this landscape into a production-ready architecture.
Xiaomi: The Phone Company Dominating AI Volume
Xiaomi is the story of this report. A consumer electronics company best known for smartphones and smart-home hardware holds 21.1% of OpenRouter weekly tokens, three times OpenAI's 7.5%. Xiaomi's AI lab shipped three frontier checkpoints between December 2025 and March 2026 under the MiMo brand, and each variant carved out a clear slot in the pricing curve.
1.04M context, $1 input / $3 output per million tokens. The #1 model on OpenRouter at 4.79T weekly tokens and 25.5% of all coding traffic.
262K context, $0.40 input / $2 output per million tokens. Unified image, video, and audio architecture in a single checkpoint.
262K context, $0.09 input / $0.29 output per million tokens. Top open-source claim on general reasoning at this price point.
For deeper technical coverage, see our MiMo-V2-Pro trillion-parameter release guide and the MiMo-V2-Omni omnimodal release guide.
Why Xiaomi won volume
Three decisions drove the rankings. First, MiMo-V2-Pro shipped on OpenRouter with a free preview tier that AI coding IDEs adopted as a default backend within weeks. Second, the 1.04M context window matched Qwen 3.6 Plus and exceeded most US frontier context ceilings, making MiMo the obvious choice for whole-repo refactors. Third, the pricing gap versus Claude Opus 4.6 is roughly 5x at input and 8x at output, which matters when agent frameworks expand token consumption by an order of magnitude.
Alibaba Qwen: The Scale Leader
Alibaba's Qwen family is the second-ranked provider at 13.9% share and 2.77T weekly tokens, but it is the broadest product line in Chinese AI. Between January 23 and April 2, 2026, Alibaba shipped six named Qwen releases covering coding, reasoning, ultra-cheap inference, omnimodal, and flagship tiers.
- Qwen 3.6 Plus (Apr 2, 2026) — Flagship. 1M context, 65K output, always-on chain-of-thought, native function calling. Free during preview. Currently #2 on OpenRouter with 1.64T weekly tokens.
- Qwen 3.5-Omni (Mar 30, 2026) — Native omnimodal. 256K context, 113 languages for speech recognition, Thinker-Talker architecture. Mostly closed-source.
- Qwen 3.5 Flash (Feb 24, 2026) — Ultra-cheap high context. 1M tokens, $0.065 input / $0.26 output per million. Top pick for cost-sensitive batch workloads.
- Qwen 3 Coder Next (Feb 3, 2026) — Coding-specific. 256K context, $0.12 input / $0.75 output per million. Purpose-built for IDE integrations.
- Qwen 3 Max Thinking (Jan 23, 2026) — Reasoning variant. 262K context, $0.78 input / $3.90 output per million. Positioned against GPT-5.4 Pro and Opus 4.6 on hard problems.
- Qwen 3.5 small series (Mar 2-3, 2026) — On-device. 0.8B to 9B parameters. The 9B variant beats several closed US models on GPQA Diamond.
For current flagship detail, see our Qwen 3.6 Plus 1M-context release guide.
Alibaba is also the provider most actively pushing Chinese AI into cross-border commerce and enterprise workflows through Alibaba Cloud International. For Western teams evaluating Chinese models under compliance constraints, Qwen's licensing terms and cloud footprint are typically the least painful starting point.
Zhipu GLM: Enterprise Enablement
Z.ai — the international brand for Zhipu AI — holds 5.6% OpenRouter share but commands an outsized share of Chinese domestic enterprise procurement. GLM-5 launched February 11, 2026 with 744B total parameters, 44B active per token, a Mixture-of-Experts architecture with 256 experts, and an MIT license. The model trains and serves end-to-end on Huawei Ascend silicon, making it the cleanest answer for Chinese state-owned and export-control-sensitive buyers.
- 744B total parameters, 44B active, 200K context window.
- 77.8% SWE-bench Verified, competitive with Claude Sonnet 4.6.
- $0.80 input / $2.56 output per million tokens on direct API, $1.20 / $4 on the GLM-5 Turbo faster variant.
- MIT licensed weights — among the most permissive in the Chinese frontier tier.
- GLM-5V-Turbo (April 1, 2026) adds 744B multimodal vision and agentic browsing benchmarks.
For the full architectural and benchmark breakdown, see our Zhipu GLM-5 744B MoE release analysis.
Zhipu's positioning is the clearest example of the bifurcation in Chinese AI. While Xiaomi and Alibaba optimize for global developer volume, Zhipu optimizes for Chinese enterprise deployment, domestic-hardware independence, and permissive licensing. For Western self-hosters and researchers, GLM-5's MIT license is often the deciding factor over the Qwen and Kimi licensing frameworks.
DeepSeek: Open-Weight Economics
DeepSeek holds 5.6% OpenRouter share through DeepSeek V3.2, a 685B parameter MoE model released December 2025 with DeepSeek Sparse Attention, gold-medal performance on the 2025 IMO and IOI, and Thinking-in-Tool-Use behavior. The Speciale variant reportedly surpasses GPT-5 on specific reasoning benchmarks, though direct head-to-head evaluations remain hard to replicate.
For a detailed walkthrough of the V3.2 release, see our DeepSeek V3.2 and Speciale complete guide.
DeepSeek V4 status: As of April 2026, V4 is expected but not released. Public reporting points to ~1T total parameters, 1M context, and Huawei Ascend as the primary hardware stack, with projected pricing in the $0.10-$0.30 per million input range. Treat any V4 timeline as unconfirmed until DeepSeek publishes an official release.
DeepSeek's strategic value is open-weight economics. V3.2 is available for self-hosting, fine-tuning, and quantization with commercial license terms that many Western teams find acceptable. When the production decision comes down to "run the model inside my own cloud region or not," DeepSeek is usually on the shortlist alongside GLM-5 and Kimi K2.5.
Moonshot Kimi: Long-Context Agents
Moonshot AI's Kimi K2.5 launched January 27, 2026 with 1 trillion total parameters, 32B active per request, a 262K context window, and $0.38 input / $1.72 output per million tokens. The standout feature is Agent Swarm technology — the ability to coordinate up to 100 agents simultaneously inside a single inference loop. Moonshot claims K2.5 beats Claude Opus 4.5 on agentic benchmarks, a claim that holds up on several public harnesses.
Kimi K2.5 is also notable as the base model powering Cursor Composer 2, which scored 73.7% on SWE-bench Multilingual at launch. That makes K2.5 one of the few Chinese models actively embedded in a Western developer tool's production stack, rather than served only as a standalone API.
For the full agent swarm architecture breakdown, see our Kimi K2.5 agent swarm open-source guide.
MiniMax: Self-Evolving Agentic Workflows
MiniMax holds 8.1% OpenRouter share and is the provider most associated with agentic self-evolution. Its M-series shipped four numbered releases in under six months: M2, M2.1, M2.5 (February 12, 2026, 80.2% SWE-Bench Verified), and M2.7 (March 18, 2026, 56.22% SWE-Pro, 10B active parameters, roughly 50x cheaper than Claude Opus per comparable workload). MiniMax M2.7 is currently #4 on OpenRouter at 1.34T weekly tokens with +24% week-over-week growth.
For deeper coverage on the M2.7 release, see our MiniMax M2.7 agentic coding release guide.
MiniMax's positioning is the inverse of Xiaomi's. Rather than competing on raw context or pricing, MiniMax leans into self-evolving architectures — models that update internal representations across multi-step runs, enabling agents to refine strategies within a single task rather than requiring human feedback cycles. For agentic workloads with well-defined reward signals, M2.7 is often the best price-to-capability option on the OpenRouter top ten.
StepFun and Second-Tier Providers
Step 3.5 Flash from StepFun (February 2, 2026) is the surprise performer outside the top four. A 196B MoE with 11B active per token, 262K context, $0.10 input / $0.30 output per million tokens in the paid tier and free on OpenRouter preview, Step 3.5 Flash sits at #3 on the OpenRouter free-model leaderboard at 1.38T weekly tokens. StepFun trained the model on NVIDIA Hopper rather than Ascend, producing up to 350 tokens per second at inference.
Below StepFun, a handful of second-tier providers serve specific niches:
- ByteDance Seed 2.0 (Doubao) — China's most-used consumer AI app with 155M weekly active users. Pro variant matches GPT-5.2 at ~10x lower cost. Seed 2.0 Lite and Mini extend the family into ultra-cheap tiers ($0.10-$0.25 per million input).
- Baidu ERNIE 5.0 — 2.4T-parameter omnimodal flagship, trained on Baidu Kunlun silicon. Integrated with Baidu's search engine for the dominant Chinese discovery surface.
- KwaiKAT KAT-Coder-Pro V2 — Coding-specific Kuaishou model released March 27, 2026. 256K context, $0.30 / $1.20 per million, competitive on Chinese-language coding.
- Baichuan and Yi — Active open-source and enterprise tiers in China, limited OpenRouter presence, positioned against Qwen and GLM for domestic buyers.
- Xunfei (iFlytek) — Spark series, strong on speech, education, and public-sector workflows. Limited global-market ranking.
- Tencent Hunyuan — Internal enterprise flagship, integrated across WeChat and Tencent Cloud. Minimal OpenRouter footprint by design.
Note that NVIDIA's Nemotron 3 Super 120B (released March 10-11, 2026, 60.47% SWE-Bench Verified, open source, 262K context) often appears in "Chinese AI" conversations because of its pricing profile, but it is an NVIDIA model trained and served outside China. Do not count it toward Chinese market share.
Pricing Comparison Matrix
The table below covers ten current flagship and volume models from the ten providers, sorted by provider. All figures are OpenRouter list prices as of April 2026 and cover input, output, and context for the default API variant.
| Provider | Model | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| Xiaomi | MiMo V2 Pro | $1.00 | $3.00 | 1.04M |
| Xiaomi | MiMo V2 Flash | $0.09 | $0.29 | 262K |
| Alibaba | Qwen 3.6 Plus | Free (preview) | Free (preview) | 1M |
| Alibaba | Qwen 3.5 Flash | $0.065 | $0.26 | 1M |
| Alibaba | Qwen 3 Max Thinking | $0.78 | $3.90 | 262K |
| Z.ai (Zhipu) | GLM-5 | $0.80 | $2.56 | 200K |
| Z.ai (Zhipu) | GLM-5 Turbo | $1.20 | $4.00 | 203K |
| DeepSeek | DeepSeek V3.2 | Low | Low | Long |
| Moonshot | Kimi K2.5 | $0.38 | $1.72 | 262K |
| MiniMax | MiniMax M2.7 | $0.30 | $1.20 | 205K |
| MiniMax | MiniMax M2.5 | $0.12 | $0.99 | 197K |
| StepFun | Step 3.5 Flash | $0.10 | $0.30 | 262K |
| ByteDance | Seed 2.0 Lite | $0.25 | $2.00 | 262K |
For reference, OpenAI GPT-5.4 sits at $2.50 / $15.00 with 1.05M context, Claude Sonnet 4.6 at $3.00 / $15.00 with 1M, and Claude Opus 4.6 at $5.00 / $25.00 with 1M. Against those anchors, the Chinese flagship tier undercuts US pricing by roughly 2.5-5x at input and 4-8x at output on comparable context lengths.
Capability Comparison Matrix
The capability matrix below scores flagship models across five workloads: coding, reasoning, tool use, multimodal inputs, and multilingual handling. Scores reflect the reference benchmarks cited in our provider breakdowns earlier in this report.
| Model | Coding | Reasoning | Tool use | Multimodal | Multilingual |
|---|---|---|---|---|---|
| MiMo V2 Pro (Xiaomi) | Strong (25.5% coding share) | Solid | #1 OpenRouter tool calls | Text only | Chinese + English |
| MiMo V2 Omni (Xiaomi) | Solid | Solid | Strong | Image + video + audio | Chinese + English |
| Qwen 3.6 Plus (Alibaba) | Strong (23.5% coding share) | Always-on CoT | Native function calling | Text + limited image | Broad |
| Qwen 3.5-Omni (Alibaba) | Solid | Solid | Strong | Full omnimodal | 113 languages speech |
| GLM-5 (Zhipu) | 77.8% SWE-Verified | Strong | Solid | GLM-5V-Turbo variant | Chinese + English |
| DeepSeek V3.2 | Strong | IMO/IOI gold | Thinking-in-Tool-Use | Text only | Chinese + English |
| Kimi K2.5 (Moonshot) | Cursor Composer 2 base | Strong | 100-agent swarm | Multimodal MoE | Chinese + English |
| MiniMax M2.7 | 56.22% SWE-Pro | Self-evolving | Strong | Text primary | Chinese + English |
| Step 3.5 Flash (StepFun) | Solid | Solid | Solid | Text primary | Chinese + English |
| ERNIE 5.0 (Baidu) | Solid | Strong | Baidu search-native | Full omnimodal | Chinese + limited English |
Three patterns stand out. First, the coding leaders are MiMo-V2-Pro and Qwen 3.6 Plus — combined they capture roughly 49% of all coding tokens on OpenRouter. Second, Qwen 3.5-Omni and ERNIE 5.0 are the most genuinely omnimodal flagships, with MiMo-V2-Omni close behind. Third, Chinese-language strength is universal but English-language tone quality varies more than headline benchmarks suggest — an important consideration for consumer-facing deployments targeting US audiences.
Enterprise Readiness and Export Controls
Pricing and capability tell only part of the procurement story. The enterprise-readiness matrix below covers compliance posture, US availability, primary hosting geography, and hardware stack. These are the dimensions that typically decide whether a model passes internal review at a US or EU-based buyer.
| Provider | SOC 2 | GDPR posture | US API access | Primary geography | Hardware |
|---|---|---|---|---|---|
| Xiaomi | Not published | Limited | Via OpenRouter | China | Mixed |
| Alibaba | Alibaba Cloud certs | EU regions available | Direct + OpenRouter | China + international | NVIDIA + mixed |
| Z.ai (Zhipu) | Enterprise program | Self-host path | Via OpenRouter + self-host | China | Huawei Ascend |
| DeepSeek | Not published | Self-host path | Via OpenRouter + self-host | China | NVIDIA today, Ascend (V4) |
| Moonshot | Not published | Self-host path | Via OpenRouter + self-host | China | Mixed |
| MiniMax | Not published | Limited | Via OpenRouter | China | Mixed |
| StepFun | Not published | Limited | Via OpenRouter | China | NVIDIA Hopper |
| ByteDance | Volcano Engine certs | EU tenants available | Direct + OpenRouter | China + international | Mixed |
| Baidu | Domestic certs | Limited | Baidu AI Cloud | China | Baidu Kunlun |
| Tencent | Tencent Cloud certs | EU tenants available | Tencent Cloud | China + international | Mixed |
For regulated industries, the safer path is self-hosting open-weight models (GLM-5, Kimi K2.5, Qwen 3.5 small series) inside your own cloud region with documented data processing controls, rather than hitting Chinese-hosted APIs directly.
Huawei Ascend-trained models (GLM-5, DeepSeek V4 when released) are the procurement story for Chinese state-owned buyers under US export controls. For Western buyers, the choice is neutral to slightly negative given the smaller tooling ecosystem.
Running a model evaluation? Production selection benefits from a structured scoring process across cost, capability, compliance, and workload fit. Our Analytics Insights and CRM Automation practices translate rankings like these into measurable production wins.
Conclusion: The Q2 2026 Playbook
Chinese AI crossed 45% of OpenRouter traffic because ten providers converged on a consistent playbook: ship flagship models with large context windows, price aggressively below US frontier, offer a free preview that AI IDE platforms adopt as a default backend, and keep a clear open-weight path for enterprise self-hosting. Xiaomi's 21.1% share is the most dramatic data point, but the pattern runs across Alibaba, Zhipu, MiniMax, DeepSeek, and StepFun.
The actionable takeaway for Western buyers is narrower than the headlines suggest. Chinese flagship models are genuinely cheaper and competitive on coding and long-context workloads. They lag on compliance posture, English-language tone, and enterprise tooling integrations. The sensible strategy in most production stacks is a multi-model architecture — US frontier for customer-facing English output and compliance-sensitive workflows, Chinese models for internal coding, batch processing, and cost-sensitive agent workloads, all routed through a cost-aware orchestration layer.
Translate the 2026 AI landscape into production wins
Picking the right mix of frontier and Chinese models across cost, compliance, and capability is where strategy meets engineering. We help teams route the right workload to the right provider.
Frequently Asked Questions
Related Guides
Continue exploring...