AI Development11 min read

Zhipu GLM-5.1: 94% of Claude Opus 4.6 Coding Performance

Zhipu AI releases GLM-5.1 scoring 45.3 on coding benchmarks — 94.6% of Claude Opus 4.6. Huawei-trained, open-source promised. Full analysis and comparison.

Digital Applied Team
March 27, 2026
11 min read
45.3

Coding Score

94.6%

Of Claude Opus

744B

Parameters

$3/mo

Promo Price

Key Takeaways

GLM-5.1 scores 45.3 on coding benchmarks, 94.6% of Claude Opus 4.6: Zhipu AI (Z.ai) released GLM-5.1 on March 27, 2026, claiming a coding evaluation score of 45.3 compared to Claude Opus 4.6's 47.9 when measured using Claude Code as the evaluation harness. This represents a 28% improvement over the GLM-5 baseline score of 35.4. These benchmarks are self-reported by Z.ai and have not yet been independently verified.
Built on a 744B parameter Mixture-of-Experts architecture: GLM-5.1 inherits the GLM-5 base architecture: 744 billion total parameters with 256 experts and 8 active per token, resulting in 40 billion active parameters per inference. The model supports a 200K context window and 131,072 max output tokens, using DeepSeek Sparse Attention for efficient long-context processing.
Trained entirely on 100,000 Huawei Ascend 910B chips with zero Nvidia hardware: Zhipu AI trained the full GLM-5 family exclusively on Huawei Ascend 910B accelerators, making it one of the most prominent demonstrations that competitive frontier models can be built without access to Nvidia GPUs. This is particularly significant given Zhipu's placement on the US Entity List since January 2025.
Aggressive pricing undercuts Western frontier models significantly: The GLM Coding Plan starts at a promotional price of $3/month for 120 prompts, with the standalone GLM-5 API priced at $1.00 per million input tokens and $3.20 per million output tokens. This pricing positions Z.ai well below comparable Western frontier model pricing tiers.
Benchmarks are self-reported with no independent verification yet: As of the March 27, 2026 release date, all coding benchmark figures come exclusively from Z.ai's own documentation. No third-party evaluation labs or independent researchers have published corroborating results for GLM-5.1. Developers should treat these numbers as preliminary until external validation is available.

Zhipu AI, operating internationally under the Z.ai brand, released GLM-5.1 on March 27, 2026 as an incremental upgrade to its flagship GLM-5 model. The headline claim is striking: a coding evaluation score of 45.3, placing it at 94.6% of Claude Opus 4.6's score of 47.9 on the same benchmark. If the numbers hold under independent scrutiny, GLM-5.1 represents the closest any Chinese-developed model has come to matching the coding performance of Anthropic's leading model.

The context matters as much as the numbers. GLM-5.1 was trained entirely on Huawei Ascend hardware without any Nvidia chips, the company has been on the US Entity List since January 2025, and Zhipu recently became a publicly traded company on the Hong Kong Stock Exchange. This is not just a model release but a data point in the broader story of how AI and digital transformation capabilities are developing outside the Western ecosystem. For a deeper analysis of the GLM-5 base model that GLM-5.1 builds upon, see our comprehensive GLM-5 analysis.

What Is GLM-5.1

GLM-5.1 is the latest iteration in Zhipu AI's GLM model family, positioned as an incremental upgrade to the GLM-5 base model that launched on February 11, 2026. The naming convention follows the pattern established by the GLM-4.x series: major versions introduce architectural changes, while point releases refine post-training, alignment, and task-specific performance without fundamentally altering the base architecture.

The model family has iterated rapidly over the past nine months: GLM-4.5 in July 2025, GLM-4.6 in September, GLM-4.7 in December, GLM-5 in February 2026, GLM-5-Turbo on March 15, and now GLM-5.1 on March 27. That cadence, roughly one significant release per month in 2026 alone, reflects both the competitive pressure in the Chinese AI market and Zhipu's position as the country's number three AI player according to IDC.

Incremental Upgrade

GLM-5.1 builds on the GLM-5 base architecture with refined post-training, particularly targeting coding performance. No full technical report has been released for the specific architectural changes.

28% Coding Improvement

The coding evaluation score improved from 35.4 (GLM-5) to 45.3 (GLM-5.1), a 28% gain that Z.ai attributes to enhanced post-training optimization for code generation and reasoning tasks.

Z.ai Branding

Since rebranding from "Zhipu AI" to "Z.ai" in July 2025, the company has pushed its international presence with English-language documentation, global API access, and dollar-denominated pricing.

Zhipu AI itself is a significant player in the broader AI landscape. Founded in 2019 as a Tsinghua University spin-out, the company IPO'd on the Hong Kong Stock Exchange on January 8, 2026 under ticker 2513, reaching a market capitalization of approximately $31.3 billion. Its investor roster reads like a who's who of Chinese technology: Alibaba, Tencent, Meituan, Ant Group, Xiaomi, and Saudi Aramco's Prosperity7 venture arm.

Coding Benchmark Results and Transparency

The central claim of the GLM-5.1 release is its coding performance. Z.ai reports the following scores using Claude Code as the evaluation harness, as documented on their official developer documentation at docs.z.ai/devpack/using5.1:

Coding Evaluation Scores (Claude Code Harness)
ModelScoreRelative Performance
Claude Opus 4.647.9100% (baseline)
GLM-5.145.394.6% of Claude Opus 4.6
GLM-5 (baseline)35.473.9% of Claude Opus 4.6

The 28% improvement from GLM-5 (35.4) to GLM-5.1 (45.3) is substantial for a point release. For context, GLM-5 already posted strong numbers on established benchmarks: 77.8 on SWE-bench Verified and 56.2 on Terminal Bench 2.0. The GLM-5.1 coding evaluation uses a different harness (Claude Code), which makes direct comparison with those earlier benchmarks difficult.

The choice to benchmark against Claude Opus 4.6 specifically is a deliberate strategic signal. Claude Opus 4.6 is widely regarded as the strongest coding model available from a Western provider, particularly in agentic coding workflows. By positioning GLM-5.1 as reaching 94.6% of that benchmark, Z.ai is making a clear competitive statement aimed at international developers who might consider GLM as an alternative or supplementary model for code generation workloads.

Technical Architecture of GLM-5

Since Z.ai has not yet published a technical report detailing GLM-5.1's specific architectural changes, the best available reference is the GLM-5 base architecture. GLM-5.1 almost certainly shares the same foundational design with modifications concentrated in post-training and alignment rather than the base model structure.

GLM-5 Architecture Specifications
Total Parameters744 billion
ArchitectureMixture-of-Experts (MoE)
Active Parameters40 billion per token
Expert Configuration256 experts, 8 active per token
Training Tokens28.5 trillion
Context Window200K (204,800 tokens)
Max Output131,072 tokens
Attention MechanismDeepSeek Sparse Attention (DSA)
LicenseApache-2.0 (open source)

The Mixture-of-Experts design is a key efficiency decision. While the model contains 744 billion parameters total, only 40 billion are active during any given forward pass. This means GLM-5 can match the knowledge capacity of much larger dense models while keeping inference costs comparable to a 40B dense model. The 256-expert, 8-active configuration is more aggressive than what most Western MoE models use, suggesting Zhipu's confidence in their expert routing mechanisms.

The use of DeepSeek Sparse Attention is notable as a cross-pollination between Chinese AI labs. DeepSeek developed this attention mechanism for efficient long-context processing, and Zhipu's adoption of it in GLM-5 reflects the collaborative (and competitive) dynamics within China's AI ecosystem. The 200K context window enabled by DSA places GLM-5 in the same tier as Claude's context length while keeping attention computation costs manageable.

Post-training for GLM-5 uses Zhipu's proprietary "Slime" asynchronous reinforcement learning infrastructure. While the details are sparse, the name suggests an approach to parallelizing RL training across their Ascend compute cluster. The 28% coding score improvement in GLM-5.1 likely reflects refinements to this RL pipeline rather than changes to the base architecture.

The Huawei Ascend Training Story

Perhaps the most geopolitically significant aspect of the GLM-5 family is its training infrastructure. Zhipu AI trained the entire model on 100,000 Huawei Ascend 910B accelerator chips with zero Nvidia hardware involved. This is not a marketing choice; it is a necessity. Zhipu has been on the US Entity List since January 2025, which severely restricts the company's ability to acquire American-made AI accelerators.

Hardware Independence

100,000 Huawei Ascend 910B chips running the entire training pipeline. This represents one of the largest non-Nvidia training clusters in the world and demonstrates that competitive frontier models can be produced outside the Nvidia ecosystem.

Entity List Implications

Zhipu's Entity List designation since January 2025 makes the Ascend-only training stack a strategic imperative rather than a preference. The fact that the resulting model approaches frontier performance challenges assumptions about export control effectiveness.

The Ascend 910B is Huawei's answer to Nvidia's A100 and H100 series. While individual chip-to-chip benchmarks generally favor Nvidia, Zhipu's approach compensates through scale (100,000 chips) and custom software optimization including the Slime RL infrastructure. The fact that GLM-5 and GLM-5.1 achieve competitive benchmark scores suggests the Ascend ecosystem has matured significantly, even if it has not yet reached Nvidia-level per-chip efficiency.

For the global AI industry, this has implications beyond Zhipu itself. It demonstrates a viable path to frontier-class model training without dependence on American semiconductor supply chains. Whether other Chinese labs, or eventually non-Chinese organizations seeking hardware diversification, adopt similar Ascend-based approaches will be one of the key infrastructure stories to watch through 2026 and beyond. As we discussed in our 2026 AI predictions and trends forecast, the diversification of AI training hardware is accelerating faster than many analysts expected.

Pricing and Developer Access

Z.ai has structured GLM-5.1 access through its GLM Coding Plan, a subscription model with three tiers designed for developers who want to use the model through an integrated coding interface rather than raw API access.

Lite

$3/mo

$10/month regular price

120 prompts per month. Suitable for evaluation and occasional use.

Pro

$15/mo

$30/month regular price

600 prompts per month. Designed for active development workflows.

Max

Higher Tier

Pricing not yet disclosed

For power users and teams requiring higher volume access to GLM-5.1.

For developers who prefer raw API access, the standalone GLM-5 API is priced at $1.00 per million input tokens and $3.20 per million output tokens. The API uses an OpenAI-compatible interface at https://api.z.ai/api/coding/paas/v4, meaning existing tools and libraries built for the OpenAI API can work with GLM-5 by changing the base URL and API key. It is not yet confirmed whether GLM-5.1 specifically is available through the standalone API or only through the Coding Plan tiers.

The promotional pricing is aggressive. At $3/month for the Lite tier, the barrier to evaluation is essentially zero for professional developers. The regular pricing of $10/month and $30/month is also well below what Claude Pro or ChatGPT Plus charge, though the prompt limits (120 and 600 respectively) mean heavy users may find the per-prompt economics less favorable than they initially appear.

Competitive Landscape

GLM-5.1's claimed coding performance places it in a conversation with the top-tier models from Western and Chinese providers. The competitive picture is complex and changes rapidly. For a broader comparison of how the major AI assistants stack up across multiple dimensions, our ChatGPT vs Claude vs Gemini vs Grok comparison covers the landscape in detail.

Competitive Positioning (Coding Focus)
1

Claude Opus 4.6 (Anthropic)

The current benchmark leader in agentic coding. GLM-5.1 claims 94.6% of its score. Strongest in multi-file reasoning and complex refactoring.

2

GPT-4.5/o3 (OpenAI)

Strong coding performance with broad tool ecosystem integration. Competes primarily on developer tooling and API reliability rather than raw benchmark scores.

3

Gemini 2.5 Pro (Google)

Strong across general reasoning and coding with massive context windows. Google's infrastructure advantage enables competitive pricing.

4

DeepSeek V3/R1 (DeepSeek)

Fellow Chinese lab with strong open-source models. DeepSeek Sparse Attention, used in GLM-5 itself, originated here. Competes on efficiency and open availability.

5

GLM-5.1 (Z.ai)

Claims near-parity with Claude Opus 4.6 on coding. Differentiates on price, open-source licensing, and non-Nvidia training infrastructure.

GLM-5.1's competitive differentiation rests on three pillars: price (significantly cheaper than Western alternatives), open-source licensing (Apache-2.0 for the base model), and hardware independence (no Nvidia dependency). These advantages matter most to developers and organizations operating in cost-sensitive environments, those requiring full model transparency, or those in jurisdictions where access to Western AI providers is restricted or uncertain.

The weakness is trust and verification. Western frontier models benefit from extensive third-party evaluation, public benchmarking leaderboards, and large communities of developers testing real-world performance. GLM-5.1 is still in the early hours of its public life, and the self-reported nature of its benchmarks means the competitive positioning could shift significantly once independent evaluations are published.

The Open Source Promise

One of the most consequential aspects of the GLM-5 family is its open-source stance. The base GLM-5 model is released under the Apache-2.0 license, one of the most permissive open-source licenses available. This means any developer or organization can download, modify, fine-tune, and deploy GLM-5 for any purpose, including commercial applications, without royalty obligations.

For GLM-5.1 specifically, Li Zixuan, Zhipu AI's Global Head, has publicly stated on X that the model will also be released as open source. However, no specific timeline has been provided. As of the March 27, 2026 release date, GLM-5.1 is accessible only through the paid GLM Coding Plan tiers. The gap between the announcement and the open-source release is worth monitoring. Previous GLM models have eventually been released as promised, but delays are not uncommon in this space.

Already Open Source
  • GLM-5 base model (Apache-2.0)
  • Full model weights available
  • Commercial use permitted
Promised, Not Yet Available
  • GLM-5.1 weights (announced, no timeline)
  • Currently available only via Coding Plan
  • No technical report published yet

If GLM-5.1 does achieve near-Claude-Opus-level coding performance and is released under Apache-2.0, the implications for the open-source AI ecosystem are significant. It would mean a 744B-parameter frontier model with strong coding capabilities is available for anyone to run, fine-tune, and build upon. For organizations with the compute resources to host it, this eliminates API dependency and per-token costs entirely.

Business Implications for Developers

For development teams and businesses evaluating AI coding tools, the GLM-5.1 release adds another credible option to an already crowded field. The practical question is not whether GLM-5.1 is better or worse than Claude Opus 4.6, but whether it offers sufficient performance for specific use cases at a price point that justifies integration.

Cost Arbitrage Opportunity

At $1.00/M input tokens vs $15/M for Claude Opus, GLM-5 offers 15x cheaper API pricing. If GLM-5.1 performance holds at 94.6% of Opus quality, the cost-per-quality ratio favors GLM heavily for high-volume, cost-sensitive workloads like automated code review, test generation, and documentation.

Self-Hosting for Data Sovereignty

The Apache-2.0 license on GLM-5 (and potentially GLM-5.1) enables organizations to run the model on their own infrastructure. This matters for enterprises in regulated industries where sending proprietary code to external APIs raises compliance concerns.

Multi-Provider Diversification

Relying on a single AI provider creates vendor lock-in risk. Adding GLM to a model routing strategy alongside Claude, GPT, and Gemini provides resilience against provider outages, pricing changes, and policy shifts.

Verification Before Adoption

Given the self-reported nature of the benchmarks, prudent development teams should wait for independent evaluations or conduct their own testing before making GLM-5.1 a core part of their development workflow. The $3/month Lite plan makes evaluation low-risk.

The broader trend is unmistakable: the gap between the absolute best coding models and the next tier is compressing rapidly. When a Chinese-developed model trained on non-Nvidia hardware can credibly claim 94.6% of the best Western model's coding score, the era of clear single-provider dominance in AI coding assistance is ending. For businesses building AI strategies, this means the focus should shift from picking winners to building flexible architectures that can incorporate the best model for each specific task.

Conclusion

GLM-5.1 is a noteworthy release that deserves attention but also skepticism in equal measure. The claimed 94.6% coding performance relative to Claude Opus 4.6 is a remarkable number, and the 28% improvement over GLM-5 in a point release suggests effective post-training optimization. The Huawei Ascend training story, the aggressive pricing, and the open-source licensing all add competitive differentiation that matters in different segments of the market.

The essential caveat remains: these are self-reported benchmarks from a model that launched today. The AI community will need weeks to months to produce independent evaluations, real-world performance comparisons, and detailed analyses of where GLM-5.1 excels and where it falls short. The smart approach for developers is to experiment at the $3/month entry point, wait for third-party validation, and avoid making major infrastructure decisions based on day-one claims from any provider.

Navigate the AI Model Landscape

With frontier models emerging from multiple providers and geographies, choosing the right AI tools requires strategic evaluation beyond headline benchmarks. Our team helps businesses design AI-powered workflows that deliver measurable results regardless of which model leads the charts this month.

Free consultation
Expert guidance
Tailored solutions

Related Articles

Continue exploring with these related guides