To run GLM-5.2 inside Claude Code, you need exactly two environment variables — an auth token and a base URL — plus one detail most setup guides get wrong: the documented model ID is glm-5.2[1m], brackets included, not bare glm-5.2. Drop the suffix and everything still appears to work, but your sessions quietly run against a smaller default context instead of the model’s full 1M-token window.
The reason this setup is worth thirty minutes of your time is the economics. GLM-5.2, announced June 13, 2026 with MIT-licensed open weights following on June 16, lands near-frontier on many single-shot coding benchmarks at a fraction of Claude’s API cost — while still trailing Claude Opus 4.8 on sustained long-horizon agent work, by the vendor’s own published numbers. That combination makes it a strong second engine inside the harness you already use, not a wholesale replacement.
This guide covers how the redirect actually works, all three official setup paths, the exact config block from Z.ai’s docs, the troubleshooting steps that fix nearly every failed install, an evidence-based task-routing table, and the honest cost math — by slot, not just the flattering Opus comparison.
- 01Two env vars flip the backend.ANTHROPIC_AUTH_TOKEN carries your Z.ai key; ANTHROPIC_BASE_URL points at https://api.z.ai/api/anthropic. Three official setup paths write them for you: an npx wizard, a shell script, or manual settings.json.
- 02The model ID is glm-5.2[1m], not glm-5.2.Z.ai’s docs put the bracketed value in both the Opus and Sonnet slots — the suffix unlocks the full 1M-token context. Dropping it silently shrinks your window; it is a top-two reported setup mistake.
- 03A new terminal window is required.Config edits do not reach an already-open session. Close every Claude Code window and launch fresh — the single most common reason a correct config appears not to work.
- 04Route by task, not by headline.Vendor-published scores show near-parity on single-shot work (Terminal-Bench 81.0 vs 85.0) but real gaps on long-horizon tasks (NL2Repo 48.9 vs 69.7; SWE-Marathon 13.0 vs 26.0). Keep multi-step agent runs on Claude.
- 05Savings depend on which slot you swap.Against Opus 4.8 list rates, GLM-5.2 is ~3.6× cheaper on input and ~5.7× on output. Against Sonnet 5’s intro pricing the gap is only ~1.4× / ~2.3× — most posts quote the flashier number.
01 — How It WorksOne base URL redirects the whole session.
Claude Code reads a handful of environment variables at launch. Set ANTHROPIC_BASE_URL to https://api.z.ai/api/anthropic and every request the harness makes — plans, edits, tool calls — routes to Z.ai’s Anthropic-compatible endpoint instead of Anthropic’s API. ANTHROPIC_AUTH_TOKEN carries your Z.ai key. Three more variables remap the model slots Claude Code exposes: the Opus and Sonnet slots to glm-5.2[1m], and the lightweight Haiku slot to glm-4.7, per the official Claude Code setup page.
Be clear about what this is: a whole-session swap, not per-prompt routing. Once the base URL points at Z.ai, there is no native way to send one prompt to GLM-5.2 and the next to a Claude model inside the same running session. The pattern several independent guides converge on is per-task instead — keep your native Claude setup untouched as the default, and add a small shell function that launches a GLM-backed session on demand:
# ~/.zshrc — native Claude stays default; launch GLM on demand
glm() {
ANTHROPIC_AUTH_TOKEN="your-zai-api-key" \
ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic" \
ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]" \
ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7" \
claude "$@"
}One more distinction before you generate a key. A GLM Coding Plan key bills as a flat monthly fee with prompt-based quota and is restricted to officially supported coding tools. A standard pay-as-you-go key works with any application and bills per token at list rates. Both plug into the same ANTHROPIC_AUTH_TOKEN slot — the difference is which kind of key you generate on Z.ai’s side, not a different Claude Code config. If you plan to run GLM-5.2 as an executor under a Claude orchestrator, our guide to pairing Fable 5 as orchestrator with GLM-5.2 as executor builds on exactly this per-task launch pattern.
02 — Three Ways InWizard, script, or manual — a real decision.
Most walkthroughs show exactly one method. Z.ai documents three, and they suit genuinely different situations — this is a decision, not a list of synonyms.
npx wizard
Interactive eight-step wizard: pick a language, choose the Global or China plan, enter your key, select tools, auto-install anything missing, and optionally wire MCP services. Manages Claude Code, OpenCode, Crush, and Factory Droid from one config at ~/.chelper/config.yaml.
Shell script
One command downloads and runs Z.ai’s script, which edits ~/.claude/settings.json for you. macOS and Linux only — no Windows support. The natural choice when you are scripting a repeatable machine setup.
Manual settings.json
Edit the file yourself with the exact env block in Section 03. Full control, works everywhere including Windows, easiest to audit and to undo — and easiest to break with a stray comma. Validate the JSON before relaunching.
Paths A and B in full:
# Path A — interactive wizard
npx @z_ai/coding-helper
# Path B — automated script (macOS / Linux only)
curl -O "https://cdn.bigmodel.cn/install/claude_code_zai_env.sh"
bash ./claude_code_zai_env.shThe wizard’s npm package describes itself as “A CLI helper for GLM Coding Plan Users to manage coding tools like claude-code.” It sits at v0.0.7 across 37 published versions and predates GLM-5.2 itself — it is a general-purpose Coding Plan manager, not a model-specific installer. Beyond the wizard, the CLI carries a few commands worth knowing: chelper auth for interactive key entry, chelper auth reload claude to push updated plan info into your Claude Code config without re-running the whole flow, chelper auth revoke to delete a saved key, and chelper doctor for a health check.
03 — The Exact ConfigThe env block, with the [1m] suffix intact.
This is the manual-configuration block as documented on Z.ai’s Claude Code page, retrieved July 3, 2026. Copy the model strings exactly — quoted, brackets included:
// ~/.claude/settings.json
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-zai-api-key",
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.7"
}
}The same docs page recommends three optional variables. The long timeout matters because GLM’s serving can be slower at peak; the auto-compact window should match the 1M context so Claude Code’s compaction doesn’t trigger prematurely; the third trims telemetry and other nonessential requests:
// Optional — recommended on the same docs page
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"glm-5.2 in the model slots. The officially documented identifier is glm-5.2[1m] — the bracket suffix is what unlocks the full 1M-token context, and community troubleshooting threads rank the missing suffix among the two most common setup mistakes. Note also that the Haiku-slot default has already changed once: launch-week guides showed glm-4.5-air; the current documented value is glm-4.7. Treat any config snippet older than a few weeks as suspect and re-check the live docs.04 — First RunLaunch, confirm, and the four fixes that solve almost everything.
Whichever path you took, the next step is the same and it is not optional: close every Claude Code window and open a new terminal. The docs state plainly that a new window is required for the changes to take effect — environment variables load at launch, so an already-open session never sees your edits. Then run claude. On first contact with the non-Anthropic base URL, Claude Code asks whether you want to use this API key — select Yes, exactly as the docs instruct.
Version hygiene matters more than it looks. Z.ai’s docs state: “We have verified compatibility with Claude Code 2.0.14 and other versions.” Check yours with claude --version and run claude update if it is stale — an outdated binary can surface a model-doesn’t-exist error that looks like a GLM problem but isn’t. If the config still doesn’t take, Z.ai’s own FAQ escalation is: open a fresh terminal again; if that fails, delete ~/.claude/settings.json entirely and reconfigure — Claude Code regenerates the file automatically — and check your JSON for missing or extra commas. An independent walkthrough published June 30, 2026 on the AI Maker newsletter confirms the same sequence works end to end, two weeks after launch.
Dropping the [1m] suffix
Everything connects, but sessions run on a smaller default context and compaction kicks in far earlier than expected. Copy the exact documented string into both the Opus and Sonnet slots.
Reusing an open terminal
Config edits never reach an already-running session. Close all Claude Code windows, open a fresh terminal, and run claude again — the docs call this step out explicitly.
A stale Claude Code binary
Outdated installs can throw a model-doesn’t-exist error that reads like a Z.ai outage. Z.ai says it has verified compatibility with Claude Code 2.0.14 and other versions — update first, debug second.
Mixing up the two endpoints
The Anthropic-compatible base URL is for Claude Code. The OpenAI-compatible /api/paas/v4/chat/completions endpoint is for direct API integrations. They are different base URLs for different modes — swapping them breaks the connection.
05 — Honest RoutingNear-frontier single-shot, real gaps long-horizon.
The setup is the easy half; the routing decision is where teams actually win or lose money. The honest frame, supported by the vendor’s own published numbers: GLM-5.2 lands near-frontier on many single-shot coding benchmarks at a fraction of the cost, and trails Opus 4.8 on sustained long-horizon agent work. The chart below is drawn from Z.ai’s release blog on Hugging Face — vendor-stated scores, cited as such.
GLM-5.2 vs Claude Opus 4.8 · higher is better
Source: Z.ai release blog, Hugging Face, June 2026 — vendor-published scoresThe pattern in those five rows is not random — it is the trend that should shape your routing. On short-feedback-loop tasks, where the model proposes and a human or test suite verifies within minutes, open-weight models have effectively converged with frontier. Where the gap survives is compounding autonomy: every extra unsupervised step multiplies small quality differences, which is why the deficit grows from four points on Terminal-Bench to a factor of two on SWE-Marathon. The moat has moved from single answers to sustained judgment. For the deeper score-by-score analysis, see the full benchmark breakdown.
| Task type | Route to | Evidence (GLM vs Opus) | Why |
|---|---|---|---|
| Route to GLM-5.2 — single-shot, iterative, high-volume | |||
| Single-shot code generation | GLM-5.2 | Terminal-Bench 2.1: 81.0 vs 85.0 | Near-parity on terminal-style tasks at a fraction of the token cost. |
| Iterative refactors in a known codebase | GLM-5.2 | FrontierSWE: 74.4 vs 75.1 | Sub-point gap, and a tight review loop catches misses cheaply. |
| High-volume test and docs generation | GLM-5.2 | Output rate: $4.40 vs $25 per Mtok | Volume work where output tokens dominate the bill, not capability. |
| Keep on Claude — long-horizon, multi-step | |||
| Multi-file feature builds | Opus 4.8 / Fable 5 | SWE-bench Pro: 62.1 vs 69.2 | A seven-point gap compounds across dependent edits. |
| Whole-repo builds from a spec | Opus 4.8 / Fable 5 | NL2Repo: 48.9 vs 69.7 | The largest gap anywhere in the vendor’s own published table. |
| Long autonomous agent runs | Opus 4.8 / Fable 5 | SWE-Marathon: 13.0 vs 26.0 | Half the score on the benchmark built for exactly this workload. |
| Production deploys and high-stakes changes | Opus 4.8 / Fable 5 | Judgment call — no single benchmark | When rework costs more than tokens, capability wins the trade. |
Community enthusiasm for the swap is real, and worth hearing in the original voice — with the caveat that it describes exactly the short-loop work the table routes to GLM:
"you can pipe GLM 5.2 straight into Claude Code (yes, the same terminal you already use for Opus) and get a coding agent that performs between Opus 4.7 and 4.8 capabilities—for pennies."— Gencay (LearnAIWithMe), guest post on the AI Maker newsletter, June 30, 2026
Treat that as one practitioner’s read on short-loop tasks, not a blanket equivalence — the long-horizon rows above are where the characterization stops holding. If you want this thinking as a general discipline rather than a one-model decision, a broader model-routing framework covers scoring, fallbacks, and evals across providers — and our AI transformation engagements typically start by building exactly this routing layer against a client’s own repos.
06 — Cost MathWhat the swap saves, by slot.
Every comparison post quotes the Opus number, because it is the flattering one. But if your daily driver inside Claude Code is the Sonnet slot — and for most developers it is — the honest multiple is much smaller. GLM-5.2’s list pricing is $1.40 per million input tokens and $4.40 per million output tokens, with cached input at $0.26 (the pricing page currently marks cached-token storage as limited-time free — vendor-stated). Here is what that means per slot:
| Slot you swap | Native $/Mtok (in / out) | GLM-5.2 $/Mtok (in / out) | Input multiple | Output multiple |
|---|---|---|---|---|
| Opus slot (Opus 4.8) | $5.00 / $25.00 | $1.40 / $4.40 | ~3.6× cheaper | ~5.7× cheaper |
| Sonnet slot (Sonnet 5, intro pricing through Aug 31, 2026) | $2.00 / $10.00 | $1.40 / $4.40 | ~1.4× cheaper | ~2.3× cheaper |
| Sonnet slot (from Sep 1, 2026) | $3.00 / $15.00 | $1.40 / $4.40 | ~2.1× cheaper | ~3.4× cheaper |
Two readings follow. First, if you mostly run the Sonnet slot, the savings today are modest — roughly 1.4× on input and 2.3× on output during Sonnet 5’s intro window — and a Coding Plan subscription may beat pay-as-you-go entirely for steady daily use. Second, the picture shifts on September 1, 2026, when Sonnet 5’s announced list rates step up to $3 / $15 and the same swap widens to roughly 2.1× and 3.4×. Prices on both sides of this table have already moved once this year; treat the multiples as a July 2026 snapshot and re-run the division before making a platform bet. If you’d rather buy GLM-5.2 tokens somewhere other than Z.ai first-party, our companion guide compares where to buy GLM-5.2 API access across hosts.
07 — Plans & QuotasThe Coding Plan tiers, and the fine print that matters.
For sustained daily coding, most people will run this through a GLM Coding Plan rather than per-token billing. Three tiers, at list pricing:
Lite — list price
Roughly 80 prompts per rolling five-hour window and about 400 per week, per Z.ai’s published approximations. Enough for evenings-and-weekends work and for testing whether the routing split suits you.
Pro — list price
Around 400 prompts per five-hour window and about 2,000 per week (approximate, vendor-published). The realistic tier for a full-time developer running GLM as the high-volume engine.
Max — list price
Approximately 1,600 prompts per window and about 8,000 weekly. Sized for heavy agentic use or small teams sharing a routing setup across parallel sessions.
The quota mechanics reward timing. Per the devpack overview, GLM-5.2 prompts draw down quota at a 3× multiplier during peak hours — defined as 14:00–18:00 China Standard Time (UTC+8), which is 06:00–10:00 UTC, early morning in Europe and overnight on the US East Coast — and 2× off-peak, with Z.ai advertising a temporary 1× off-peak promotion through the end of September 2026 (vendor-stated). Western working hours mostly fall off-peak, which quietly stretches every tier. Z.ai’s subscribe page also lists billing-cycle discounts — deeper for quarterly and yearly commitments, with a yearly Lite working out near $12.60 a month — and some Z.ai surfaces display already-discounted monthly prices, so treat $18 / $72 / $160 as the list anchor. For a full usage-pattern breakdown of whether the $18 plan is worth it for your usage pattern, we ran the numbers separately.
If the tier math fits your workload, you can subscribe through our GLM Coding Plan link — per Z.ai’s campaign rules, the discount applies to a new, never-paid account’s first subscription order only, doesn’t stack with other first-order campaigns, and requires completing payment within 72 hours of clicking through.
Referral link: we earn Z.ai platform credits if you subscribe, and new Z.ai accounts get 10% off their first subscription order.
08 — ConclusionA second engine, not a replacement.
Two env vars, one exact model ID, and honest routing rules.
The mechanics are genuinely simple: point ANTHROPIC_BASE_URL at Z.ai’s Anthropic-compatible endpoint, supply your key, and put glm-5.2[1m] — brackets included — in the Opus and Sonnet slots. Pick the wizard for convenience, the script for repeatability, or manual JSON for control. Then open a new terminal, because no config change lands in an old one.
The judgment call sits above the config. Vendor-published numbers put GLM-5.2 near-frontier on single-shot and short-loop coding at a fraction of the cost, and meaningfully behind Opus 4.8 on long-horizon, whole-repo autonomy — the vendor says as much in its own release notes. The winning pattern is not choosing a side; it is a shell function and a routing table: GLM for volume, Claude for sustained judgment.
And because every number in this post is a July 2026 snapshot of a fast-moving market — a Haiku-slot default that already changed once, intro pricing that steps up September 1, promos with expiration dates — build the re-check into your routine. The teams that win this cycle are not the ones that picked the perfect model; they are the ones that made switching cheap.