AI DevelopmentPlaybook10 min readPublished July 3, 2026

Two env vars · one exact model ID · honest routing rules

Run GLM-5.2 Inside Claude Code: The Full Setup Guide

Z.ai’s official docs give three ways to pipe GLM-5.2 into the Claude Code terminal you already use: an npx wizard, a shell script, or manual settings.json. Two env vars are mandatory, the documented model ID is glm-5.2[1m] — brackets included — and the honest move is routing by task, not swapping wholesale.

DA
Digital Applied Team
Senior strategists · Published Jul 3, 2026
PublishedJul 3, 2026
Read time10 min
SourcesZ.ai docs · vendor benchmarks
Mandatory env vars
2
AUTH_TOKEN + BASE_URL
3 setup paths
Context window
1M
tokens — only with glm-5.2[1m]
Output rate vs Opus slot
5.7×
cheaper · $4.40 vs $25 per Mtok
input ~3.6×
SWE-Marathon score
13.0
vs Opus 4.8’s 26.0 · vendor-stated
long-horizon gap

To run GLM-5.2 inside Claude Code, you need exactly two environment variables — an auth token and a base URL — plus one detail most setup guides get wrong: the documented model ID is glm-5.2[1m], brackets included, not bare glm-5.2. Drop the suffix and everything still appears to work, but your sessions quietly run against a smaller default context instead of the model’s full 1M-token window.

The reason this setup is worth thirty minutes of your time is the economics. GLM-5.2, announced June 13, 2026 with MIT-licensed open weights following on June 16, lands near-frontier on many single-shot coding benchmarks at a fraction of Claude’s API cost — while still trailing Claude Opus 4.8 on sustained long-horizon agent work, by the vendor’s own published numbers. That combination makes it a strong second engine inside the harness you already use, not a wholesale replacement.

This guide covers how the redirect actually works, all three official setup paths, the exact config block from Z.ai’s docs, the troubleshooting steps that fix nearly every failed install, an evidence-based task-routing table, and the honest cost math — by slot, not just the flattering Opus comparison.

Key takeaways
  1. 01
    Two env vars flip the backend.ANTHROPIC_AUTH_TOKEN carries your Z.ai key; ANTHROPIC_BASE_URL points at https://api.z.ai/api/anthropic. Three official setup paths write them for you: an npx wizard, a shell script, or manual settings.json.
  2. 02
    The model ID is glm-5.2[1m], not glm-5.2.Z.ai’s docs put the bracketed value in both the Opus and Sonnet slots — the suffix unlocks the full 1M-token context. Dropping it silently shrinks your window; it is a top-two reported setup mistake.
  3. 03
    A new terminal window is required.Config edits do not reach an already-open session. Close every Claude Code window and launch fresh — the single most common reason a correct config appears not to work.
  4. 04
    Route by task, not by headline.Vendor-published scores show near-parity on single-shot work (Terminal-Bench 81.0 vs 85.0) but real gaps on long-horizon tasks (NL2Repo 48.9 vs 69.7; SWE-Marathon 13.0 vs 26.0). Keep multi-step agent runs on Claude.
  5. 05
    Savings depend on which slot you swap.Against Opus 4.8 list rates, GLM-5.2 is ~3.6× cheaper on input and ~5.7× on output. Against Sonnet 5’s intro pricing the gap is only ~1.4× / ~2.3× — most posts quote the flashier number.

01How It WorksOne base URL redirects the whole session.

Claude Code reads a handful of environment variables at launch. Set ANTHROPIC_BASE_URL to https://api.z.ai/api/anthropic and every request the harness makes — plans, edits, tool calls — routes to Z.ai’s Anthropic-compatible endpoint instead of Anthropic’s API. ANTHROPIC_AUTH_TOKEN carries your Z.ai key. Three more variables remap the model slots Claude Code exposes: the Opus and Sonnet slots to glm-5.2[1m], and the lightweight Haiku slot to glm-4.7, per the official Claude Code setup page.

Be clear about what this is: a whole-session swap, not per-prompt routing. Once the base URL points at Z.ai, there is no native way to send one prompt to GLM-5.2 and the next to a Claude model inside the same running session. The pattern several independent guides converge on is per-task instead — keep your native Claude setup untouched as the default, and add a small shell function that launches a GLM-backed session on demand:

# ~/.zshrc — native Claude stays default; launch GLM on demand
glm() {
  ANTHROPIC_AUTH_TOKEN="your-zai-api-key" \
  ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic" \
  ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]" \
  ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]" \
  ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7" \
  claude "$@"
}

One more distinction before you generate a key. A GLM Coding Plan key bills as a flat monthly fee with prompt-based quota and is restricted to officially supported coding tools. A standard pay-as-you-go key works with any application and bills per token at list rates. Both plug into the same ANTHROPIC_AUTH_TOKEN slot — the difference is which kind of key you generate on Z.ai’s side, not a different Claude Code config. If you plan to run GLM-5.2 as an executor under a Claude orchestrator, our guide to pairing Fable 5 as orchestrator with GLM-5.2 as executor builds on exactly this per-task launch pattern.

02Three Ways InWizard, script, or manual — a real decision.

Most walkthroughs show exactly one method. Z.ai documents three, and they suit genuinely different situations — this is a decision, not a list of synonyms.

Path A
npx wizard
npx @z_ai/coding-helper

Interactive eight-step wizard: pick a language, choose the Global or China plan, enter your key, select tools, auto-install anything missing, and optionally wire MCP services. Manages Claude Code, OpenCode, Crush, and Factory Droid from one config at ~/.chelper/config.yaml.

Best for first-time setup · multi-tool users
Path B
Shell script
curl … claude_code_zai_env.sh

One command downloads and runs Z.ai’s script, which edits ~/.claude/settings.json for you. macOS and Linux only — no Windows support. The natural choice when you are scripting a repeatable machine setup.

Best for scripted setups · macOS / Linux
Path C
Manual settings.json
~/.claude/settings.json

Edit the file yourself with the exact env block in Section 03. Full control, works everywhere including Windows, easiest to audit and to undo — and easiest to break with a stray comma. Validate the JSON before relaunching.

Best for full control · Windows

Paths A and B in full:

# Path A — interactive wizard
npx @z_ai/coding-helper

# Path B — automated script (macOS / Linux only)
curl -O "https://cdn.bigmodel.cn/install/claude_code_zai_env.sh"
bash ./claude_code_zai_env.sh

The wizard’s npm package describes itself as “A CLI helper for GLM Coding Plan Users to manage coding tools like claude-code.” It sits at v0.0.7 across 37 published versions and predates GLM-5.2 itself — it is a general-purpose Coding Plan manager, not a model-specific installer. Beyond the wizard, the CLI carries a few commands worth knowing: chelper auth for interactive key entry, chelper auth reload claude to push updated plan info into your Claude Code config without re-running the whole flow, chelper auth revoke to delete a saved key, and chelper doctor for a health check.

03The Exact ConfigThe env block, with the [1m] suffix intact.

This is the manual-configuration block as documented on Z.ai’s Claude Code page, retrieved July 3, 2026. Copy the model strings exactly — quoted, brackets included:

// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "your-zai-api-key",
    "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.7"
  }
}

The same docs page recommends three optional variables. The long timeout matters because GLM’s serving can be slower at peak; the auto-compact window should match the 1M context so Claude Code’s compaction doesn’t trigger prematurely; the third trims telemetry and other nonessential requests:

// Optional — recommended on the same docs page
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
The string that silently breaks the value
Many guides show bare glm-5.2 in the model slots. The officially documented identifier is glm-5.2[1m] — the bracket suffix is what unlocks the full 1M-token context, and community troubleshooting threads rank the missing suffix among the two most common setup mistakes. Note also that the Haiku-slot default has already changed once: launch-week guides showed glm-4.5-air; the current documented value is glm-4.7. Treat any config snippet older than a few weeks as suspect and re-check the live docs.

04First RunLaunch, confirm, and the four fixes that solve almost everything.

Whichever path you took, the next step is the same and it is not optional: close every Claude Code window and open a new terminal. The docs state plainly that a new window is required for the changes to take effect — environment variables load at launch, so an already-open session never sees your edits. Then run claude. On first contact with the non-Anthropic base URL, Claude Code asks whether you want to use this API key — select Yes, exactly as the docs instruct.

Version hygiene matters more than it looks. Z.ai’s docs state: “We have verified compatibility with Claude Code 2.0.14 and other versions.” Check yours with claude --version and run claude update if it is stale — an outdated binary can surface a model-doesn’t-exist error that looks like a GLM problem but isn’t. If the config still doesn’t take, Z.ai’s own FAQ escalation is: open a fresh terminal again; if that fails, delete ~/.claude/settings.json entirely and reconfigure — Claude Code regenerates the file automatically — and check your JSON for missing or extra commas. An independent walkthrough published June 30, 2026 on the AI Maker newsletter confirms the same sequence works end to end, two weeks after launch.

Mistake 01
Dropping the [1m] suffix
glm-5.2 ≠ glm-5.2[1m]

Everything connects, but sessions run on a smaller default context and compaction kicks in far earlier than expected. Copy the exact documented string into both the Opus and Sonnet slots.

Fix: paste the bracketed ID verbatim
Mistake 02
Reusing an open terminal
env vars load at launch

Config edits never reach an already-running session. Close all Claude Code windows, open a fresh terminal, and run claude again — the docs call this step out explicitly.

Fix: new window, relaunch
Mistake 03
A stale Claude Code binary
claude --version

Outdated installs can throw a model-doesn’t-exist error that reads like a Z.ai outage. Z.ai says it has verified compatibility with Claude Code 2.0.14 and other versions — update first, debug second.

Fix: claude update
Mistake 04
Mixing up the two endpoints
/api/anthropic vs /api/paas/v4/

The Anthropic-compatible base URL is for Claude Code. The OpenAI-compatible /api/paas/v4/chat/completions endpoint is for direct API integrations. They are different base URLs for different modes — swapping them breaks the connection.

Fix: Claude Code uses /api/anthropic

05Honest RoutingNear-frontier single-shot, real gaps long-horizon.

The setup is the easy half; the routing decision is where teams actually win or lose money. The honest frame, supported by the vendor’s own published numbers: GLM-5.2 lands near-frontier on many single-shot coding benchmarks at a fraction of the cost, and trails Opus 4.8 on sustained long-horizon agent work. The chart below is drawn from Z.ai’s release blog on Hugging Face — vendor-stated scores, cited as such.

GLM-5.2 vs Claude Opus 4.8 · higher is better

Source: Z.ai release blog, Hugging Face, June 2026 — vendor-published scores
Terminal-Bench 2.1GLM-5.2 81.0 · Opus 4.8 85.0
81.0
FrontierSWEGLM-5.2 74.4 · Opus 4.8 75.1
74.4
SWE-bench ProGLM-5.2 62.1 · Opus 4.8 69.2
62.1
NL2RepoGLM-5.2 48.9 · Opus 4.8 69.7
48.9
SWE-MarathonGLM-5.2 13.0 · Opus 4.8 26.0
13.0
The vendor’s own framing
Z.ai’s release notes are unusually candid about where the gap lives: “On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series.” Read that as a routing instruction: the long-horizon tier belongs to Claude.

The pattern in those five rows is not random — it is the trend that should shape your routing. On short-feedback-loop tasks, where the model proposes and a human or test suite verifies within minutes, open-weight models have effectively converged with frontier. Where the gap survives is compounding autonomy: every extra unsupervised step multiplies small quality differences, which is why the deficit grows from four points on Terminal-Bench to a factor of two on SWE-Marathon. The moat has moved from single answers to sustained judgment. For the deeper score-by-score analysis, see the full benchmark breakdown.

Task-type routing between GLM-5.2 and Claude models inside Claude Code. Benchmark scores are vendor-published by Z.ai (Hugging Face release blog, June 2026); grouping and rationale are Digital Applied’s synthesis, July 2026.
Task typeRoute toEvidence (GLM vs Opus)Why
Route to GLM-5.2 — single-shot, iterative, high-volume
Single-shot code generationGLM-5.2Terminal-Bench 2.1: 81.0 vs 85.0Near-parity on terminal-style tasks at a fraction of the token cost.
Iterative refactors in a known codebaseGLM-5.2FrontierSWE: 74.4 vs 75.1Sub-point gap, and a tight review loop catches misses cheaply.
High-volume test and docs generationGLM-5.2Output rate: $4.40 vs $25 per MtokVolume work where output tokens dominate the bill, not capability.
Keep on Claude — long-horizon, multi-step
Multi-file feature buildsOpus 4.8 / Fable 5SWE-bench Pro: 62.1 vs 69.2A seven-point gap compounds across dependent edits.
Whole-repo builds from a specOpus 4.8 / Fable 5NL2Repo: 48.9 vs 69.7The largest gap anywhere in the vendor’s own published table.
Long autonomous agent runsOpus 4.8 / Fable 5SWE-Marathon: 13.0 vs 26.0Half the score on the benchmark built for exactly this workload.
Production deploys and high-stakes changesOpus 4.8 / Fable 5Judgment call — no single benchmarkWhen rework costs more than tokens, capability wins the trade.

Community enthusiasm for the swap is real, and worth hearing in the original voice — with the caveat that it describes exactly the short-loop work the table routes to GLM:

"you can pipe GLM 5.2 straight into Claude Code (yes, the same terminal you already use for Opus) and get a coding agent that performs between Opus 4.7 and 4.8 capabilities—for pennies."— Gencay (LearnAIWithMe), guest post on the AI Maker newsletter, June 30, 2026

Treat that as one practitioner’s read on short-loop tasks, not a blanket equivalence — the long-horizon rows above are where the characterization stops holding. If you want this thinking as a general discipline rather than a one-model decision, a broader model-routing framework covers scoring, fallbacks, and evals across providers — and our AI transformation engagements typically start by building exactly this routing layer against a client’s own repos.

06Cost MathWhat the swap saves, by slot.

Every comparison post quotes the Opus number, because it is the flattering one. But if your daily driver inside Claude Code is the Sonnet slot — and for most developers it is — the honest multiple is much smaller. GLM-5.2’s list pricing is $1.40 per million input tokens and $4.40 per million output tokens, with cached input at $0.26 (the pricing page currently marks cached-token storage as limited-time free — vendor-stated). Here is what that means per slot:

Cost multiples from swapping each Claude Code model slot to GLM-5.2, computed from Z.ai list pricing (July 2026) and Anthropic list pricing (verified July 2026). Multiples are the ratio of native rate to GLM-5.2 rate for input and output tokens.
Slot you swapNative $/Mtok (in / out)GLM-5.2 $/Mtok (in / out)Input multipleOutput multiple
Opus slot (Opus 4.8)$5.00 / $25.00$1.40 / $4.40~3.6× cheaper~5.7× cheaper
Sonnet slot (Sonnet 5, intro pricing through Aug 31, 2026)$2.00 / $10.00$1.40 / $4.40~1.4× cheaper~2.3× cheaper
Sonnet slot (from Sep 1, 2026)$3.00 / $15.00$1.40 / $4.40~2.1× cheaper~3.4× cheaper

Two readings follow. First, if you mostly run the Sonnet slot, the savings today are modest — roughly 1.4× on input and 2.3× on output during Sonnet 5’s intro window — and a Coding Plan subscription may beat pay-as-you-go entirely for steady daily use. Second, the picture shifts on September 1, 2026, when Sonnet 5’s announced list rates step up to $3 / $15 and the same swap widens to roughly 2.1× and 3.4×. Prices on both sides of this table have already moved once this year; treat the multiples as a July 2026 snapshot and re-run the division before making a platform bet. If you’d rather buy GLM-5.2 tokens somewhere other than Z.ai first-party, our companion guide compares where to buy GLM-5.2 API access across hosts.

07Plans & QuotasThe Coding Plan tiers, and the fine print that matters.

For sustained daily coding, most people will run this through a GLM Coding Plan rather than per-token billing. Three tiers, at list pricing:

Entry tier
Lite — list price
$18/mo

Roughly 80 prompts per rolling five-hour window and about 400 per week, per Z.ai’s published approximations. Enough for evenings-and-weekends work and for testing whether the routing split suits you.

~80 prompts / 5h
5× usage
Pro — list price
$72/mo

Around 400 prompts per five-hour window and about 2,000 per week (approximate, vendor-published). The realistic tier for a full-time developer running GLM as the high-volume engine.

~400 prompts / 5h
20× usage
Max — list price
$160/mo

Approximately 1,600 prompts per window and about 8,000 weekly. Sized for heavy agentic use or small teams sharing a routing setup across parallel sessions.

~1,600 prompts / 5h

The quota mechanics reward timing. Per the devpack overview, GLM-5.2 prompts draw down quota at a 3× multiplier during peak hours — defined as 14:00–18:00 China Standard Time (UTC+8), which is 06:00–10:00 UTC, early morning in Europe and overnight on the US East Coast — and 2× off-peak, with Z.ai advertising a temporary 1× off-peak promotion through the end of September 2026 (vendor-stated). Western working hours mostly fall off-peak, which quietly stretches every tier. Z.ai’s subscribe page also lists billing-cycle discounts — deeper for quarterly and yearly commitments, with a yearly Lite working out near $12.60 a month — and some Z.ai surfaces display already-discounted monthly prices, so treat $18 / $72 / $160 as the list anchor. For a full usage-pattern breakdown of whether the $18 plan is worth it for your usage pattern, we ran the numbers separately.

If the tier math fits your workload, you can subscribe through our GLM Coding Plan link — per Z.ai’s campaign rules, the discount applies to a new, never-paid account’s first subscription order only, doesn’t stack with other first-order campaigns, and requires completing payment within 72 hours of clicking through.

Referral link: we earn Z.ai platform credits if you subscribe, and new Z.ai accounts get 10% off their first subscription order.

08ConclusionA second engine, not a replacement.

The setup, distilled

Two env vars, one exact model ID, and honest routing rules.

The mechanics are genuinely simple: point ANTHROPIC_BASE_URL at Z.ai’s Anthropic-compatible endpoint, supply your key, and put glm-5.2[1m] — brackets included — in the Opus and Sonnet slots. Pick the wizard for convenience, the script for repeatability, or manual JSON for control. Then open a new terminal, because no config change lands in an old one.

The judgment call sits above the config. Vendor-published numbers put GLM-5.2 near-frontier on single-shot and short-loop coding at a fraction of the cost, and meaningfully behind Opus 4.8 on long-horizon, whole-repo autonomy — the vendor says as much in its own release notes. The winning pattern is not choosing a side; it is a shell function and a routing table: GLM for volume, Claude for sustained judgment.

And because every number in this post is a July 2026 snapshot of a fast-moving market — a Haiku-slot default that already changed once, intro pricing that steps up September 1, promos with expiration dates — build the re-check into your routine. The teams that win this cycle are not the ones that picked the perfect model; they are the ones that made switching cheap.

Cost-aware model routing in production

Pay frontier prices only where frontier capability actually matters.

We design and operate multi-model coding stacks — native Claude for long-horizon agent work, GLM-class models for high-volume tasks — with routing rules, spend guardrails, and evals tuned to your codebase.

Free consultationExpert guidanceTailored solutions
What we work on

Model-routing engagements

  • Claude Code multi-backend setup and hardening
  • Task-routing tables grounded in your own eval runs
  • Token-spend audits across Opus / Sonnet / GLM slots
  • Quota and plan-tier sizing for team rollouts
  • Guardrails for long-horizon agent workloads
FAQ · GLM-5.2 in Claude Code

Setup questions, answered.

Three things: a working Claude Code install, a Z.ai API key (either a GLM Coding Plan key or a pay-as-you-go key), and two mandatory environment variables — ANTHROPIC_AUTH_TOKEN set to your key and ANTHROPIC_BASE_URL set to https://api.z.ai/api/anthropic. The official docs also override the model slots: glm-5.2[1m] for both the Opus and Sonnet slots and glm-4.7 for the Haiku slot, plus three optional variables covering the request timeout, the auto-compact window, and nonessential traffic. Z.ai states it has verified compatibility with Claude Code 2.0.14 and other versions, and recommends running claude update before you start.
Related dispatches

Keep tuning your model stack.