GLM-5 Released: 744B MoE Model vs GPT-5.2 & Claude Opus 4.5
Zhipu AI launches GLM-5 with 744B parameters, 200K context, and agentic intelligence — trained entirely on Huawei Ascend chips. Full technical analysis.
Total Parameters
Active Parameters
Pre-training Data
Context Window
Key Takeaways
Zhipu AI — the Tsinghua University spinoff that rebranded to Z.AI in 2025 and completed a landmark Hong Kong IPO in January 2026 — has officially released GLM-5, its fifth-generation large language model. With 744 billion total parameters (40B active), a 200K-token context window, and built-in agentic intelligence, GLM-5 is positioned as a direct challenger to OpenAI's GPT-5.2 and Anthropic's Claude Opus 4.5.
What makes this release strategically significant goes beyond raw capability numbers: GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework, achieving full independence from US-manufactured semiconductor hardware. This is both a technical milestone and a geopolitical statement about the viability of China's domestic AI compute stack at frontier scale.
What Is GLM-5?
GLM-5 is the fifth generation of Zhipu AI's General Language Model series, representing a generational leap from the previous GLM-4.7 (released December 2025). The model is engineered for five core domains: creative writing, coding, advanced reasoning, agentic intelligence, and long-context processing.
Zhipu AI, founded in 2019, has rapidly established itself as a leader in open-source AI. The company's Hong Kong IPO on January 8, 2026, raised approximately HKD 4.35 billion (USD $558 million) — making it the first publicly traded foundation model company globally. That capital has directly accelerated GLM-5's development.
Technical Architecture
GLM-5 employs a Mixture of Experts (MoE) architecture, scaling from GLM-4.5's 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens:
Total Parameters
744 billion
Up from GLM-4.5's 355B (2.1x scale)
Active Parameters
40 billion
Per inference (up from 32B in GLM-4.5)
Expert Architecture
256 experts
8 activated per token
Attention Mechanism
DSA (Sparse)
DeepSeek's sparse attention for long context
The model incorporates DeepSeek's Dynamically Sparse Attention (DSA) mechanism for efficient long-context handling, enabling GLM-5 to process sequences up to 200,000 tokens without the computational overhead of traditional dense attention. Maximum output reaches 131,000 tokens — among the highest in the industry.
The GLM-5 family also includes specialized variants: GLM-Image for high-fidelity image generation using a hybrid auto-regressive and diffusion approach, and GLM-4.6V/4.5V for advanced multimodal reasoning that combines vision and language understanding.
Core Capabilities
Creative Writing
GLM-5 generates high-quality, nuanced creative content with stylistic versatility — from long-form narrative and technical documentation to marketing copy and academic prose. This is a noted improvement area over GLM-4.7.
Coding
A leap from vibe coding to agentic engineering. GLM-5 excels at systems engineering and full-stack development, scoring 77.8% on SWE-bench Verified (approaching Claude Opus 4.5's 80.9%). On CC-Bench-V2, GLM-5 hits 98% frontend build success rate and 74.8% end-to-end correctness.
Advanced Reasoning
Frontier-level multi-step logical reasoning with significantly reduced hallucinations. GLM-5 scores 50.4 on Humanity's Last Exam (with tools), 89.7 on τ²-Bench, and 75.9 on BrowseComp — #1 among all models tested on the latter.
Agentic Intelligence
A deep evolution from reasoning to delivery. GLM-5's Agent Mode (Beta) moves beyond conversation to a delivery-first paradigm — automatically decomposing tasks, orchestrating tools, and executing workflows to produce ready-to-use results.
- Data Insights: Upload data, get instant charts (bar, line, pie) and analysis. Export as xlsx/csv/png.
- Smart Writing: From outline to final draft with step-by-step control. Direct PDF/Word export.
- Full-Stack Development: Enhanced instruction understanding and multi-step task execution for complex engineering.
Long-Context Processing
200K-token context window handles massive documents, entire codebases, research paper collections, and video transcripts in a single session. The 131K maximum output is among the industry's highest.
Huawei Ascend: Hardware Independence
GLM-5's training on Huawei Ascend chips is arguably as significant as the model itself. In the context of US export controls restricting advanced NVIDIA GPUs to China, Zhipu AI has demonstrated that frontier-scale AI training is achievable on domestic hardware.
- Hardware: Huawei Ascend 910 series chips
- Framework: MindSpore — Huawei's open-source deep learning framework
- Scale: 744B parameters trained end-to-end without US hardware
- Significance: First frontier-scale MoE model fully trained on non-NVIDIA hardware
- Implication: Validates China's AI compute independence strategy
This aligns with China's broader push for semiconductor self-sufficiency, targeting substantial independence in data center chips by 2027. For the global AI industry, it signals that hardware diversity in AI training is not just possible — it's happening at frontier scale.
Benchmark Performance
GLM-5 has been evaluated across 8 major agentic, reasoning, and coding benchmarks. It ranks as the #1 open-source model globally — and is competitive with closed-source frontier models from OpenAI, Anthropic, and Google.
| Benchmark | GLM-5 | Claude Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| Humanity's Last Exam | 50.4 (w/ Tools) | 43.4 (w/ Tools) | 45.8 (w/ Tools) | 45.5 (w/ Tools) |
| SWE-bench Verified | 77.8% | 80.9% | 76.2% | 80.0% |
| SWE-bench Multilingual | 73.3% | 77.5% | 65.0% | 72.0% |
| Terminal-Bench 2.0 | 56.2% | 59.3% | 54.2% | 54.0% |
| BrowseComp | 75.9 🥇 | 67.8 | 59.2 | 65.8 |
| MCP-Atlas | 67.8 | 65.2 | 66.6 | 68.0 |
| τ²-Bench | 89.7 | 91.6 | 90.7 | 85.5 |
| Vending Bench 2 | $4,432 🥇 OS | $4,967 | $5,478 | $3,591 |
🥇 = highest score among all models. 🥇 OS = highest among open-source models. Source: Zhipu AI official benchmarks.
Frontend
Build Success: 98.0% vs 93.0%
E2E Correctness: 74.8% vs 75.7%
Backend
E2E Correctness: 25.8% vs 26.9%
Long-horizon
Large Repo: 65.6% vs 64.5%
Multi-Step: 52.3% vs 61.6%
Bold = higher score. GLM-5 narrows the gap with Claude Opus 4.5 significantly, especially in frontend (+26% build success rate improvement over GLM-4.7).
The model previously appeared on OpenRouter as "Pony Alpha" in early February 2026 — a stealth test that the AI community quickly identified through benchmark analysis and GitHub pull requests. Zhipu AI has since confirmed the connection, and GLM-5 is now officially listed on OpenRouter at openrouter.ai/z-ai/glm-5.
GLM-5 vs GPT-5.2 vs Claude Opus 4.5
Architecture
GLM-5
744B MoE, 40B active
GPT-5.2
Undisclosed, dense transformer
Claude Opus 4.5
Undisclosed, dense transformer
Context Window
GLM-5
200K tokens
GPT-5.2
400K tokens
Claude Opus 4.5
200K (1M beta)
Pricing (Input)
GLM-5
~$0.11/M tokens (est.)
GPT-5.2
$1.75/M tokens
Claude Opus 4.5
$5.00/M tokens
Open Source
GLM-5
MIT license (weights on HuggingFace)
GPT-5.2
Closed source
Claude Opus 4.5
Closed source
Training Hardware
GLM-5
Huawei Ascend (fully domestic)
GPT-5.2
NVIDIA H100/B200
Claude Opus 4.5
NVIDIA/Google TPU
GLM-5's competitive advantages center on pricing and openness: it offers frontier-level capability at a fraction of GPT-5.2's cost, with open weights already available under MIT license. It leads all open-source models on Vending Bench 2 ($4,432) and BrowseComp (75.9), while approaching Claude Opus 4.5 on coding benchmarks like SWE-bench Verified (77.8% vs 80.9%). For a broader look at how Chinese AI labs are competing at the frontier, see our Chinese AI models comparison.
Pricing and Access
Cost efficiency has been a consistent advantage of the GLM series. GLM-5 is available through multiple channels — from direct chat to open weights to a dedicated coding subscription plan.
| Model | Input Price | Output Price | Open Weights |
|---|---|---|---|
| GLM-4.5 (current) | $0.35/M | $1.55/M | Yes (Hugging Face) |
| GLM-5 | ~$0.11/M | TBD | Yes (HuggingFace) |
| GPT-5.2 | $1.75/M | $14.00/M | No |
| Claude Opus 4.5 | $5.00/M | $25.00/M | No |
| Claude Sonnet 4.5 | $3.00/M | $15.00/M | No |
Access Channels
- Chat Interface: Try GLM-5 directly at chat.z.ai
- Open Weights: Download from huggingface.co/zai-org/GLM-5 (MIT license)
- OpenRouter: Available at openrouter.ai/z-ai/glm-5 (previously listed as "Pony Alpha")
- GLM Coding Plan: Subscription-based access through coding tools (see below)
GLM Coding Plan
Z.AI offers a dedicated GLM Coding Plan — a subscription package designed specifically for AI-powered coding. It provides access to GLM models across mainstream coding tools at a fraction of standard API pricing.
3× usage of the Claude Pro plan
- Managing lightweight workloads
- Only supports GLM-4.7 and historical text models
- Compatible with 20+ coding tools including Claude Code, Cursor, Cline, Kilo Code
5× Lite plan usage
- Managing complex workloads
- All Lite plan benefits
- 40–60% faster compared to Lite
- Access Vision Analyze, Web Search, Web Reader, and Zread MCP
4× Pro plan usage
- Managing high-volume workloads for advanced developers
- All Pro plan benefits
- Supports the latest flagship GLM-5
- Guaranteed peak-hour performance
- Early access to new features
Quarterly billing saves 10%. Yearly billing saves 30%. Each prompt typically allows 15–20 model calls, giving a total monthly allowance of tens of billions of tokens — all at ~1% of standard API pricing.
Default model mapping (Claude Code):
ANTHROPIC_DEFAULT_OPUS_MODEL: GLM-4.7ANTHROPIC_DEFAULT_SONNET_MODEL: GLM-4.7ANTHROPIC_DEFAULT_HAIKU_MODEL: GLM-4.5-Air
Modify ~/.claude/settings.json to switch to GLM-5 or other models. GLM-5 access requires Max plan.
- Speed: Over 55 tokens per second for real-time interaction
- No restrictions: No network barriers or account bans — just smooth, uninterrupted coding
- Expanded capabilities: All plans support Vision Understanding, Web Search MCP, and Web Reader MCP
- Data privacy: All Z.AI services are based in Singapore. No user content is stored — text prompts, images, and input data are never retained
Industry Implications
GLM-5's release carries significant implications across several dimensions:
GLM-5's open-weight pricing puts enormous pressure on OpenAI and Anthropic. At ~$0.11/M input tokens, it undercuts GPT-5.2 by 16x and Claude Opus 4.5 by 45x.
Proof that frontier AI training does not require NVIDIA hardware. This opens the door for other chip manufacturers and reduces single-vendor dependency risk.
GLM-5 is now available under MIT license on HuggingFace — a frontier-scale MoE model that accelerates open-source AI research and enables smaller organizations to deploy competitive AI capabilities.
China demonstrating frontier AI capability on domestic hardware reshapes the global AI power balance and has implications for export control policy effectiveness.
Conclusion
GLM-5 is not just another model release — it is a statement about the decentralization of frontier AI capability. A 744-billion-parameter MoE model, trained entirely on domestic hardware, scoring #1 among open-source models on multiple benchmarks, and released under MIT license on HuggingFace. Whether or not GLM-5 is the "GPT-5 killer" its positioning suggests, it demonstrates that the era of frontier AI being exclusively a US-company capability is decisively over.
For developers and businesses evaluating LLM options, GLM-5 deserves serious attention — especially for cost-sensitive applications, agentic workflows, and organizations that prefer open-weight models they can host and fine-tune independently. For a comprehensive overview of how today's frontier models compare, explore our GPT vs Claude vs Gemini vs Grok comparison.
Navigate the LLM Landscape
Expert guidance on model selection, deployment strategy, and AI integration for your business.
Frequently Asked Questions
Related AI Analysis
More on the evolving AI model landscape