GLM-4.7 Guide: Z.ai's Open-Source AI Coding Model
GLM-4.7 achieves 73.8% SWE-bench and 87.4% τ²-Bench with Preserved Thinking. Complete developer guide for the $3/month Claude Code alternative.
Total Parameters
Active Parameters
Context Window
SWE-bench
Key Takeaways
What Is GLM-4.7?
GLM-4.7 is Z.ai's flagship open-source coding model, released on December 22, 2025. Unlike previous models that focused primarily on chat capabilities, GLM-4.7 is engineered specifically for agentic coding—the ability to autonomously complete complex programming tasks across multiple files and turns.
The model represents a significant milestone: it's the first open-source LLM to approach proprietary model performance on real-world coding benchmarks while being available at a fraction of the cost. Z.ai (formerly Zhipu AI), a Tsinghua University spinoff valued at approximately $3-4 billion, has positioned GLM-4.7 as a direct alternative to Claude and GPT for developers who need capable coding assistance without enterprise pricing.
Designed from the ground up for terminal-based workflows. Works natively with Claude Code, Cline, Roo Code, and Kilo Code.
Fully open-source with commercial use permitted. Weights available on HuggingFace and ModelScope for local deployment.
Technical Specifications
GLM-4.7 uses a Mixture-of-Experts (MoE) architecture with 355 billion total parameters, but only 32 billion are active per forward pass. This design enables frontier-level capabilities while maintaining reasonable inference costs.
| Specification | GLM-4.7 | GLM-4.6 |
|---|---|---|
| Total Parameters | 355B (MoE) | Similar |
| Active Parameters | 32B | 32B |
| Context Length | 200K tokens | 128K tokens |
| Max Output | 128K tokens | 32K tokens |
| License | MIT (Open-Source) | MIT |
| Knowledge Cutoff | Mid-Late 2024 | Earlier 2024 |
Thinking Modes: The Innovation
GLM-4.7's most significant innovation is its three-tier thinking architecture. This addresses the "context collapse" problem where AI coding assistants lose track of earlier decisions during long sessions.
The model reasons before every response and every tool call. This prevents "hallucinated code" by verifying logic before generating output. Think of it as the model pausing to check its work at each step.
Unlike models that restart their thought process from scratch each turn, GLM-4.7 retains its "thinking blocks" across the entire conversation. This is analogous to a human developer who remembers why they made an architectural decision three hours ago.
Benefits:
- Reduces information loss in multi-turn sessions
- Improves cache hit rates, lowering costs
- Maintains consistency during complex refactors
Enable or disable thinking on a per-turn basis within a session. Disable for simple syntax questions to reduce latency and costs; enable for complex debugging to maximize accuracy.
"thinking": {"type": "enabled"} in your API request. For preserved thinking, set "clear_thinking": false.Benchmark Performance
GLM-4.7 demonstrates significant improvements across coding, reasoning, and agent benchmarks. Here's how it compares to leading proprietary models:
| Benchmark | GLM-4.7 | Claude Sonnet 4.5 | GPT-5.1 High | DeepSeek-V3.2 |
|---|---|---|---|---|
| SWE-bench Verified | 73.8% | 77.2% | 76.3% | 73.1% |
| LiveCodeBench v6 | 84.9% | 64.0% | 87.0% | 83.3% |
| τ²-Bench (Tools) | 87.4% | 87.2% | 82.7% | 85.3% |
| Terminal Bench 2.0 | 41.0% | 42.8% | 47.6% | 46.4% |
| HLE (w/ Tools) | 42.8% | 32.0% | 42.7% | 40.8% |
| BrowseComp | 52.0% | 24.1% | 50.8% | 51.4% |
| AIME 2025 | 95.7% | 87.0% | 94.0% | 93.1% |
LiveCodeBench: 84.9% beats Claude's 64.0%
τ²-Bench: Best-in-class tool use at 87.4%
HLE with Tools: Matches GPT-5.1 at 42.8%
BrowseComp: Doubles Claude at 52% vs 24%
SWE-bench: ~3% behind Claude Sonnet 4.5
Terminal Bench: Trails Gemini 3.0 Pro (54%)
Edge Cases: May need more prompting for simple tasks
Vibe Coding & UI Generation
Z.ai introduced the term "vibe coding" to describe GLM-4.7's improved aesthetic output. Beyond functional code, the model now generates visually appealing UI layouts, presentations, and designs.
Cleaner, more modern webpage layouts with improved color harmony, typography, and component styling. Reduces "fine-tuning" time significantly.
16:9 layout compatibility improved from 52% to 91%. Generated slides are now essentially "ready to use" without manual adjustments.
Generates interactive demos, particle effects, 3D visualizations, and creative coding projects with improved aesthetic quality.
Pricing & Access
GLM-4.7 offers multiple access options, from a budget-friendly subscription to pay-per-token API access and free local deployment.
| Model/Plan | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| GLM Coding Plan | $3/month (quota-based) | 3x Claude quota, resets every 5 hours | |
| GLM-4.7 API (Z.ai) | $0.60 | $2.20 | Direct API access |
| GLM-4.7 (OpenRouter) | $0.40 | $1.50 | Third-party provider |
| Claude Sonnet 4.5 | ~$3-4 | ~$15 | For comparison |
| DeepSeek V3.2 | $0.28 | $0.42 | Lower price point |
Getting Started
Claude Code Integration
The easiest way to use GLM-4.7 is through Claude Code with a GLM Coding Plan subscription:
# Install Claude Code
npm install -g @anthropic-ai/claude-code
# Configure for GLM-4.7
export ANTHROPIC_AUTH_TOKEN=your-zai-api-key
export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
API Quick Start (Python)
from zai import ZaiClient
client = ZaiClient(api_key="your-api-key")
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "user", "content": "Write a React component for a todo list"}
],
thinking={"type": "enabled"},
max_tokens=4096
)
print(response.choices[0].message.content)Local Deployment
For local deployment, GLM-4.7 supports vLLM, SGLang, and Ollama:
# Via Ollama (easiest)
ollama run glm-4.7
# Via HuggingFace + vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model zai-org/GLM-4.7 --tensor-parallel-size 8
Full Model (355B)
- BF16: 16× H100 (80GB)
- FP8: 8× H100 or 4× H200
Quantized (Consumer)
- 2-bit: 24GB GPU + 128GB RAM
- Speed: ~5 tokens/second
When to Use GLM-4.7
You need Claude-level coding at 1/7th the cost
Long coding sessions where context preservation matters
Tool-heavy workflows (τ²-Bench, BrowseComp)
Multilingual codebases (66.7% SWE-bench Multilingual)
You want open-source/self-hostable with MIT license
You need absolute best SWE-bench scores (Claude 77.2%)
Terminal-heavy workflows (Gemini 3.0 Pro leads at 54%)
Chat-first use cases requiring nuanced emotional handling
Local deployment without enterprise GPU infrastructure
Absolute lowest cost is priority (DeepSeek V3.2 cheaper)
Conclusion
GLM-4.7 represents a significant milestone in the democratization of AI coding. For the first time, an open-source model genuinely competes with Claude and GPT on real-world coding benchmarks—and does so at a fraction of the cost.
The Preserved Thinking innovation addresses a real pain point: maintaining coherent reasoning across long coding sessions. Combined with best-in-class tool use performance and a $3/month pricing tier, GLM-4.7 makes frontier-level coding assistance accessible to individual developers and small teams.
While it doesn't beat Claude or GPT on every benchmark, the gap has closed substantially. For developers who want Claude-like capabilities without Claude-like pricing, GLM-4.7 is worth serious consideration.
Ready to Implement GLM-4.7?
Digital Applied helps businesses integrate cutting-edge AI models into professional development workflows. From model selection to deployment optimization, we ensure your team maximizes value from open-source coding tools like GLM-4.7.
Explore AI ServicesFrequently Asked Questions
Related Articles
Continue exploring with these related guides