AI Development5 min read

GLM-4.7 Guide: Z.ai's Open-Source AI Coding Model

GLM-4.7 achieves 73.8% SWE-bench and 87.4% τ²-Bench with Preserved Thinking. Complete developer guide for the $3/month Claude Code alternative.

Digital Applied Team

December 23, 2025

5 min read

355B

Total Parameters

32B

Active Parameters

200K

Context Window

73.8%

SWE-bench

Key Takeaways

Open-Source Claude Alternative: GLM-4.7 is a 355B parameter MIT-licensed model achieving 73.8% SWE-bench—competitive with Claude Sonnet 4.5 at a fraction of the cost.

Preserved Thinking Innovation: Unlike models that restart reasoning each turn, GLM-4.7 retains thinking blocks across conversations, maintaining context in long coding sessions.

$3/Month Coding Plan: The GLM Coding Plan offers Claude-level coding at 1/7th the price with 3x usage quota, working directly with Claude Code, Cline, and Roo Code.

Best-in-Class Tool Use: Achieves 87.4% on τ²-Bench and 84.9% on LiveCodeBench, outperforming Claude Sonnet 4.5 on multiple agent and coding benchmarks.

Production-Ready for Agents: Built specifically for terminal-based agentic workflows rather than chat, with native support for multi-turn stability in coding agents.

What Is GLM-4.7?

December 2025 Release: GLM-4.7 launched December 22, 2025, positioning itself as the first open-source model to seriously challenge Claude and GPT for agentic coding workflows.

GLM-4.7 is Z.ai's flagship open-source coding model, released on December 22, 2025. Unlike previous models that focused primarily on chat capabilities, GLM-4.7 is engineered specifically for agentic coding—the ability to autonomously complete complex programming tasks across multiple files and turns.

The model represents a significant milestone: it's the first open-source LLM to approach proprietary model performance on real-world coding benchmarks while being available at a fraction of the cost. Z.ai (formerly Zhipu AI), a Tsinghua University spinoff valued at approximately $3-4 billion, has positioned GLM-4.7 as a direct alternative to Claude and GPT for developers who need capable coding assistance without enterprise pricing.

Built for Agents

Designed from the ground up for terminal-based workflows. Works natively with Claude Code, Cline, Roo Code, and Kilo Code.

MIT Licensed

Fully open-source with commercial use permitted. Weights available on HuggingFace and ModelScope for local deployment.

Technical Specifications

GLM-4.7 uses a Mixture-of-Experts (MoE) architecture with 355 billion total parameters, but only 32 billion are active per forward pass. This design enables frontier-level capabilities while maintaining reasonable inference costs.

Specification	GLM-4.7	GLM-4.6
Total Parameters	355B (MoE)	Similar
Active Parameters	32B	32B
Context Length	200K tokens	128K tokens
Max Output	128K tokens	32K tokens
License	MIT (Open-Source)	MIT
Knowledge Cutoff	Mid-Late 2024	Earlier 2024

Thinking Modes: The Innovation

GLM-4.7's most significant innovation is its three-tier thinking architecture. This addresses the "context collapse" problem where AI coding assistants lose track of earlier decisions during long sessions.

Interleaved Thinking

Active by default

The model reasons before every response and every tool call. This prevents "hallucinated code" by verifying logic before generating output. Think of it as the model pausing to check its work at each step.

Preserved ThinkingGame Changer

Enabled by default on GLM Coding Plan

Unlike models that restart their thought process from scratch each turn, GLM-4.7 retains its "thinking blocks" across the entire conversation. This is analogous to a human developer who remembers why they made an architectural decision three hours ago.

Benefits:

Reduces information loss in multi-turn sessions
Improves cache hit rates, lowering costs
Maintains consistency during complex refactors

Turn-Level Thinking Control

Developer-controllable per request

Enable or disable thinking on a per-turn basis within a session. Disable for simple syntax questions to reduce latency and costs; enable for complex debugging to maximize accuracy.

API Usage: Enable thinking with "thinking": {"type": "enabled"} in your API request. For preserved thinking, set "clear_thinking": false.

Benchmark Performance

GLM-4.7 demonstrates significant improvements across coding, reasoning, and agent benchmarks. Here's how it compares to leading proprietary models:

Benchmark	GLM-4.7	Claude Sonnet 4.5	GPT-5.1 High	DeepSeek-V3.2
SWE-bench Verified	73.8%	77.2%	76.3%	73.1%
LiveCodeBench v6	84.9%	64.0%	87.0%	83.3%
τ²-Bench (Tools)	87.4%	87.2%	82.7%	85.3%
Terminal Bench 2.0	41.0%	42.8%	47.6%	46.4%
HLE (w/ Tools)	42.8%	32.0%	42.7%	40.8%
BrowseComp	52.0%	24.1%	50.8%	51.4%
AIME 2025	95.7%	87.0%	94.0%	93.1%

Where GLM-4.7 Wins

LiveCodeBench: 84.9% beats Claude's 64.0%

τ²-Bench: Best-in-class tool use at 87.4%

HLE with Tools: Matches GPT-5.1 at 42.8%

BrowseComp: Doubles Claude at 52% vs 24%

Honest Assessment

SWE-bench: ~3% behind Claude Sonnet 4.5

Terminal Bench: Trails Gemini 3.0 Pro (54%)

Edge Cases: May need more prompting for simple tasks

Vibe Coding & UI Generation

Z.ai introduced the term "vibe coding" to describe GLM-4.7's improved aesthetic output. Beyond functional code, the model now generates visually appealing UI layouts, presentations, and designs.

UI Generation

Cleaner, more modern webpage layouts with improved color harmony, typography, and component styling. Reduces "fine-tuning" time significantly.

PPT Compatibility91%

16:9 layout compatibility improved from 52% to 91%. Generated slides are now essentially "ready to use" without manual adjustments.

Visual Artifacts

Generates interactive demos, particle effects, 3D visualizations, and creative coding projects with improved aesthetic quality.

Pricing & Access

GLM-4.7 offers multiple access options, from a budget-friendly subscription to pay-per-token API access and free local deployment.

Model/Plan	Input (per 1M tokens)	Output (per 1M tokens)	Notes
GLM Coding Plan	$3/month (quota-based)		3x Claude quota, resets every 5 hours
GLM-4.7 API (Z.ai)	$0.60	$2.20	Direct API access
GLM-4.7 (OpenRouter)	$0.40	$1.50	Third-party provider
Claude Sonnet 4.5	~$3-4	~$15	For comparison
DeepSeek V3.2	$0.28	$0.42	Lower price point

Value Proposition: GLM-4.7 is roughly 4-7x cheaper than Claude/GPT while approaching their performance levels. The $3/month Coding Plan is particularly compelling for individual developers.

Getting Started

Claude Code Integration

The easiest way to use GLM-4.7 is through Claude Code with a GLM Coding Plan subscription:

# Install Claude Code

npm install -g @anthropic-ai/claude-code

# Configure for GLM-4.7

export ANTHROPIC_AUTH_TOKEN=your-zai-api-key

export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic

API Quick Start (Python)

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Write a React component for a todo list"}
    ],
    thinking={"type": "enabled"},
    max_tokens=4096
)

print(response.choices[0].message.content)

Local Deployment

For local deployment, GLM-4.7 supports vLLM, SGLang, and Ollama:

# Via Ollama (easiest)

ollama run glm-4.7

# Via HuggingFace + vLLM

pip install vllm

python -m vllm.entrypoints.openai.api_server \

--model zai-org/GLM-4.7 --tensor-parallel-size 8

Hardware Requirements

Full Model (355B)

BF16: 16× H100 (80GB)
FP8: 8× H100 or 4× H200

Quantized (Consumer)

2-bit: 24GB GPU + 128GB RAM
Speed: ~5 tokens/second

When to Use GLM-4.7

Choose GLM-4.7 When

You need Claude-level coding at 1/7th the cost

Long coding sessions where context preservation matters

Tool-heavy workflows (τ²-Bench, BrowseComp)

Multilingual codebases (66.7% SWE-bench Multilingual)

You want open-source/self-hostable with MIT license

Consider Alternatives When

You need absolute best SWE-bench scores (Claude 77.2%)

Terminal-heavy workflows (Gemini 3.0 Pro leads at 54%)

Chat-first use cases requiring nuanced emotional handling

Local deployment without enterprise GPU infrastructure

Absolute lowest cost is priority (DeepSeek V3.2 cheaper)

Conclusion

GLM-4.7 represents a significant milestone in the democratization of AI coding. For the first time, an open-source model genuinely competes with Claude and GPT on real-world coding benchmarks—and does so at a fraction of the cost.

The Preserved Thinking innovation addresses a real pain point: maintaining coherent reasoning across long coding sessions. Combined with best-in-class tool use performance and a $3/month pricing tier, GLM-4.7 makes frontier-level coding assistance accessible to individual developers and small teams.

While it doesn't beat Claude or GPT on every benchmark, the gap has closed substantially. For developers who want Claude-like capabilities without Claude-like pricing, GLM-4.7 is worth serious consideration.

Try GLM Coding Plan View Documentation

Ready to Implement GLM-4.7?

Digital Applied helps businesses integrate cutting-edge AI models into professional development workflows. From model selection to deployment optimization, we ensure your team maximizes value from open-source coding tools like GLM-4.7.

Get Started Explore AI Services

Free consultation

Expert guidance

Tailored solutions