AI Development5 min readDecember 2025 Release

GLM-4.7 Guide: Z.ai's Open-Source AI Coding Model

GLM-4.7 achieves 73.8% SWE-bench and 87.4% τ²-Bench with Preserved Thinking. Complete developer guide for the $3/month Claude Code alternative.

Digital Applied Team
December 23, 2025
5 min read
355B

Total Parameters

32B

Active Parameters

200K

Context Window

73.8%

SWE-bench

Key Takeaways

Open-Source Claude Alternative: GLM-4.7 is a 355B parameter MIT-licensed model achieving 73.8% SWE-bench—competitive with Claude Sonnet 4.5 at a fraction of the cost.
Preserved Thinking Innovation: Unlike models that restart reasoning each turn, GLM-4.7 retains thinking blocks across conversations, maintaining context in long coding sessions.
$3/Month Coding Plan: The GLM Coding Plan offers Claude-level coding at 1/7th the price with 3x usage quota, working directly with Claude Code, Cline, and Roo Code.
Best-in-Class Tool Use: Achieves 87.4% on τ²-Bench and 84.9% on LiveCodeBench, outperforming Claude Sonnet 4.5 on multiple agent and coding benchmarks.
Production-Ready for Agents: Built specifically for terminal-based agentic workflows rather than chat, with native support for multi-turn stability in coding agents.

What Is GLM-4.7?

GLM-4.7 is Z.ai's flagship open-source coding model, released on December 22, 2025. Unlike previous models that focused primarily on chat capabilities, GLM-4.7 is engineered specifically for agentic coding—the ability to autonomously complete complex programming tasks across multiple files and turns.

The model represents a significant milestone: it's the first open-source LLM to approach proprietary model performance on real-world coding benchmarks while being available at a fraction of the cost. Z.ai (formerly Zhipu AI), a Tsinghua University spinoff valued at approximately $3-4 billion, has positioned GLM-4.7 as a direct alternative to Claude and GPT for developers who need capable coding assistance without enterprise pricing.

Built for Agents

Designed from the ground up for terminal-based workflows. Works natively with Claude Code, Cline, Roo Code, and Kilo Code.

MIT Licensed

Fully open-source with commercial use permitted. Weights available on HuggingFace and ModelScope for local deployment.

Technical Specifications

GLM-4.7 uses a Mixture-of-Experts (MoE) architecture with 355 billion total parameters, but only 32 billion are active per forward pass. This design enables frontier-level capabilities while maintaining reasonable inference costs.

SpecificationGLM-4.7GLM-4.6
Total Parameters355B (MoE)Similar
Active Parameters32B32B
Context Length200K tokens128K tokens
Max Output128K tokens32K tokens
LicenseMIT (Open-Source)MIT
Knowledge CutoffMid-Late 2024Earlier 2024

Thinking Modes: The Innovation

GLM-4.7's most significant innovation is its three-tier thinking architecture. This addresses the "context collapse" problem where AI coding assistants lose track of earlier decisions during long sessions.

Interleaved Thinking
Active by default

The model reasons before every response and every tool call. This prevents "hallucinated code" by verifying logic before generating output. Think of it as the model pausing to check its work at each step.

Preserved ThinkingGame Changer
Enabled by default on GLM Coding Plan

Unlike models that restart their thought process from scratch each turn, GLM-4.7 retains its "thinking blocks" across the entire conversation. This is analogous to a human developer who remembers why they made an architectural decision three hours ago.

Benefits:

  • Reduces information loss in multi-turn sessions
  • Improves cache hit rates, lowering costs
  • Maintains consistency during complex refactors
Turn-Level Thinking Control
Developer-controllable per request

Enable or disable thinking on a per-turn basis within a session. Disable for simple syntax questions to reduce latency and costs; enable for complex debugging to maximize accuracy.

Benchmark Performance

GLM-4.7 demonstrates significant improvements across coding, reasoning, and agent benchmarks. Here's how it compares to leading proprietary models:

BenchmarkGLM-4.7Claude Sonnet 4.5GPT-5.1 HighDeepSeek-V3.2
SWE-bench Verified73.8%77.2%76.3%73.1%
LiveCodeBench v684.9%64.0%87.0%83.3%
τ²-Bench (Tools)87.4%87.2%82.7%85.3%
Terminal Bench 2.041.0%42.8%47.6%46.4%
HLE (w/ Tools)42.8%32.0%42.7%40.8%
BrowseComp52.0%24.1%50.8%51.4%
AIME 202595.7%87.0%94.0%93.1%
Where GLM-4.7 Wins

LiveCodeBench: 84.9% beats Claude's 64.0%

τ²-Bench: Best-in-class tool use at 87.4%

HLE with Tools: Matches GPT-5.1 at 42.8%

BrowseComp: Doubles Claude at 52% vs 24%

Honest Assessment

SWE-bench: ~3% behind Claude Sonnet 4.5

Terminal Bench: Trails Gemini 3.0 Pro (54%)

Edge Cases: May need more prompting for simple tasks

Vibe Coding & UI Generation

Z.ai introduced the term "vibe coding" to describe GLM-4.7's improved aesthetic output. Beyond functional code, the model now generates visually appealing UI layouts, presentations, and designs.

UI Generation

Cleaner, more modern webpage layouts with improved color harmony, typography, and component styling. Reduces "fine-tuning" time significantly.

PPT Compatibility91%

16:9 layout compatibility improved from 52% to 91%. Generated slides are now essentially "ready to use" without manual adjustments.

Visual Artifacts

Generates interactive demos, particle effects, 3D visualizations, and creative coding projects with improved aesthetic quality.

Pricing & Access

GLM-4.7 offers multiple access options, from a budget-friendly subscription to pay-per-token API access and free local deployment.

Model/PlanInput (per 1M tokens)Output (per 1M tokens)Notes
GLM Coding Plan$3/month (quota-based)3x Claude quota, resets every 5 hours
GLM-4.7 API (Z.ai)$0.60$2.20Direct API access
GLM-4.7 (OpenRouter)$0.40$1.50Third-party provider
Claude Sonnet 4.5~$3-4~$15For comparison
DeepSeek V3.2$0.28$0.42Lower price point

Getting Started

Claude Code Integration

The easiest way to use GLM-4.7 is through Claude Code with a GLM Coding Plan subscription:

# Install Claude Code

npm install -g @anthropic-ai/claude-code


# Configure for GLM-4.7

export ANTHROPIC_AUTH_TOKEN=your-zai-api-key

export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic

API Quick Start (Python)

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Write a React component for a todo list"}
    ],
    thinking={"type": "enabled"},
    max_tokens=4096
)

print(response.choices[0].message.content)

Local Deployment

For local deployment, GLM-4.7 supports vLLM, SGLang, and Ollama:

# Via Ollama (easiest)

ollama run glm-4.7


# Via HuggingFace + vLLM

pip install vllm

python -m vllm.entrypoints.openai.api_server \

--model zai-org/GLM-4.7 --tensor-parallel-size 8

Hardware Requirements

Full Model (355B)

  • BF16: 16× H100 (80GB)
  • FP8: 8× H100 or 4× H200

Quantized (Consumer)

  • 2-bit: 24GB GPU + 128GB RAM
  • Speed: ~5 tokens/second

When to Use GLM-4.7

Choose GLM-4.7 When

You need Claude-level coding at 1/7th the cost

Long coding sessions where context preservation matters

Tool-heavy workflows (τ²-Bench, BrowseComp)

Multilingual codebases (66.7% SWE-bench Multilingual)

You want open-source/self-hostable with MIT license

Consider Alternatives When

You need absolute best SWE-bench scores (Claude 77.2%)

Terminal-heavy workflows (Gemini 3.0 Pro leads at 54%)

Chat-first use cases requiring nuanced emotional handling

Local deployment without enterprise GPU infrastructure

Absolute lowest cost is priority (DeepSeek V3.2 cheaper)

Conclusion

GLM-4.7 represents a significant milestone in the democratization of AI coding. For the first time, an open-source model genuinely competes with Claude and GPT on real-world coding benchmarks—and does so at a fraction of the cost.

The Preserved Thinking innovation addresses a real pain point: maintaining coherent reasoning across long coding sessions. Combined with best-in-class tool use performance and a $3/month pricing tier, GLM-4.7 makes frontier-level coding assistance accessible to individual developers and small teams.

While it doesn't beat Claude or GPT on every benchmark, the gap has closed substantially. For developers who want Claude-like capabilities without Claude-like pricing, GLM-4.7 is worth serious consideration.

Ready to Implement GLM-4.7?

Digital Applied helps businesses integrate cutting-edge AI models into professional development workflows. From model selection to deployment optimization, we ensure your team maximizes value from open-source coding tools like GLM-4.7.

Explore AI Services

Frequently Asked Questions

Frequently Asked Questions

Related Articles

Continue exploring with these related guides