Running Claude Fable 5 as the planner brain — the model that plans and judges while cheaper models execute — is the single highest leverage config change you can make in Hermes Agent or OpenClaw this month. Fable 5 was restored for all customers on July 1, 2026 after its June 12 export-control suspension, and Anthropic has announced that included plan usage ends July 7, with metered usage credits at standard API rates from July 8.
At $10 per million input tokens and $50 per million output — twice Opus 4.8’s $5/$25 — leaving Fable 5 as the default model for every tool call in a 24/7 agent is the expensive way to run it. The cheap way is architectural: Fable 5 touches only the plan and the review pass, and the execution loop runs on Opus 4.8 or something cheaper still. Both major open-source agent runtimes have a documented config surface for exactly this split.
This playbook covers the two frameworks’ planner/executor architectures, the exact keys in Hermes Agent’s config.yaml and OpenClaw’s openclaw.json, a side-by-side config table nobody else has published, the worked cost math, and the security hardening that has to come before you hand a frontier planner shell access.
- 01Run Fable 5 as planner and judge only.Fable 5 lists at $10/$50 per million tokens — twice Opus 4.8’s $5/$25. Route execution to cheaper models and the expensive model’s token footprint per task shrinks while the agent keeps running unattended.
- 02Each framework has one config surface for the split.Hermes Agent: the primary model plus fallback_provider in ~/.hermes/config.yaml. OpenClaw: agents.defaults.subagents.model in ~/.openclaw/openclaw.json, with per-agent and per-call overrides. Both are documented by the vendors.
- 03The worked math: $4.50 vs $2.25 per task.A 200K-input / 50K-output task costs ≈$4.50 on Fable 5 and ≈$2.25 on Opus 4.8 at list rates. Prompt caching takes 90% off cached input, dropping the input share of that task toward ~$0.20 on cache hits.
- 04Two fallbacks exist — keep them straight.Anthropic’s safeguard classifier reroutes sensitive queries to Opus 4.8 automatically, outside your control, on under 5% of sessions. Your config-level fallback_provider or fallbacks[] is a separate, user-controlled mechanism for outages and cost.
- 05Harden the skill marketplace before granting autonomy.An early-2026 audit of ClawHub found 341 of 2,857 published skills malicious (~12%), and CVE-2026-25253 (CVSS 8.8) was a one-click RCE against OpenClaw. Vet skills and isolate the runtime before the planner gets shell access.
01 — Why NowThe planner-brain pattern, and the July 8 reason to care.
Fable 5 shipped on June 9, 2026 as Anthropic’s flagship for long-horizon agentic work — planning across stages, delegating to sub-agents, running for hours while validating its own output. That job description is precisely why it belongs at the top of an agent stack rather than inside the execution loop: the skills that justify its price are planning, delegation, and judgment, not the ten thousand routine tool calls in between.
The economics turned urgent this week. After the June 12 export-control suspension, Fable 5 came back for all customers on July 1 — selectable again in Claude, the Anthropic API, and partner platforms including AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry — with pricing unchanged at $10/$50 per million tokens. Anthropic’s announced schedule gives subscribers an included window through July 7 (up to 50% of weekly plan limits), after which usage meters as credits at standard API rates. The plan-by-plan detail is in our Fable 5 usage-credits pricing guide. From July 8, every token your agent burns on Fable 5 is a token you pay list price for.
The planner-brain pattern is the standing answer. Fable 5 reads the task, writes the plan, and reviews the result; a cheaper model — Opus 4.8 at half the rate, or something smaller — does the fetching, editing, testing, and retrying. The agent’s capability ceiling stays where the frontier model sets it, while the metered bill tracks the cheap model’s rate for the bulk of the tokens.
02 — ArchitecturesTwo frameworks, two shapes of planner and executor.
Hermes Agent and OpenClaw are the two open-source runtimes where this pattern is most commonly wired up, and they model “planner plus executor” differently — which is why the config differs. MindStudio’s comparison frames it as agents built for modularity and composability versus agents built for simplicity and quick deployment. Hermes runs multi-agent orchestration: orchestrator agents spawn or call specialist agents and pass structured results between them. OpenClaw runs a single-agent plan-execute-reflect loop: it generates a task plan, executes steps, and reviews its own results before proceeding.
Hermes Agent
Orchestrator agents spawn specialist agents and pass structured results between them. The planner split is session-level: set the primary model, and use fallback_provider for cost or outage routing. Reached #1 across AI applications on OpenRouter’s global rankings on May 10, 2026.
OpenClaw
A single agent generates a task plan, executes steps, and reviews its own results before proceeding. The planner split is explicit config: model.primary for the brain, agents.defaults.subagents.model for the hands.
Both are large, actively maintained projects, not fringe tools — our July 2 GitHub snapshot showed roughly 207,600 stars on hermes-agent and 381,400 on openclaw (counts move daily; treat them as a snapshot, not a fixed claim). Momentum stories abound: NetworkChuck, a YouTuber with around five million subscribers, announced in May that after a month of use he was moving all of his OpenClaw agents to Hermes. Neither project is an Anthropic product, and Anthropic endorses neither — both are third-party runtimes that support the Anthropic API as one of several providers.
Which shape you want depends on the work. If your tasks decompose into specialist roles — a researcher, a coder, a reviewer — the Hermes orchestration model maps naturally. If your tasks are linear jobs with self-review, OpenClaw’s loop is simpler to reason about, and its subagents.model key gives you the cleanest literal expression of “the model that plans” versus “the model that executes” in any config file we’ve seen.
03 — Hermes SetupHermes: config.yaml routing and a memory that compounds.
Hermes Agent installs with a single command — curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash — which provisions uv, Python 3.11, Node.js, ripgrep, and ffmpeg, and stores all state under ~/.hermes/. Hermes works with any OpenAI-compatible endpoint and treats Anthropic as a first-class provider, so wiring Fable 5 in as the brain is a config edit, not a plugin hunt: set the primary model to claude-fable-5 in ~/.hermes/config.yaml.
The cost-aware routing lever is the fallback_provider key in the same file. It serves two jobs: resilience — a local Ollama or vLLM model as an offline fallback if the Anthropic API is unreachable — and cost control, for example Fable 5 primary with Opus 4.8 (claude-opus-4-8) as the fallback. Note what Hermes does not document: a per-subagent model key. The routing surface is session-level, which fits its orchestration architecture — you swap the whole session’s model rather than splitting one session across two models. If you’re coming from OpenClaw, hermes claw migrate imports settings, memories, skills, and API keys in one command.
What makes the planner investment compound is Hermes’s persistence. It keeps a four-layer memory: a curated MEMORY.md for environment facts and a USER.md for preferences, both loaded into the system prompt at session start, plus a SQLite archive with full-text search and a skills directory. And after a complex task — roughly five or more tool calls — Hermes writes a reusable SKILL.md document compatible with the agentskills.io open standard, so the plan Fable 5 derived once doesn’t get re-derived (and re-billed) next time. Automation entry points: /schedule for cron-style jobs, and hermes gateway setup then hermes gateway install for Telegram, Discord, or Slack. For the full walkthrough beyond planner routing, see our complete Hermes Agent setup guide and the latest Hermes desktop release notes.
MEMORY.md → SQLite
MEMORY.md environment facts and USER.md preferences load into the system prompt at session start; a SQLite archive with full-text search and a skills directory sit underneath. All state lives in ~/.hermes/.
Tasks become SKILL.md
After a complex task of roughly five or more tool calls, Hermes writes a reusable SKILL.md compatible with the agentskills.io standard — the planner’s approach is captured instead of re-derived.
OpenRouter, May 10, 2026
Hermes hit #1 across all AI applications on OpenRouter’s global rankings roughly 90 days after its February 2026 launch. Star counts (≈207,600 at our July 2 snapshot) move daily.
"Fable 5 supplies the reasoning. Hermes supplies the loop, memory, tools, and persistence. The combination is a self-improving agent that holds context across days of work and runs on the strongest publicly available coding model."— Lushbinary editorial team, Claude Fable 5 + Hermes Agent Setup Guide
04 — OpenClaw SetupOpenClaw: subagents.model is the whole trick.
OpenClaw’s config lives at ~/.openclaw/openclaw.json (or ~/.clawdbot/clawdbot.json for installs from the older Clawdbot npm package), edited via openclaw config edit or at the path openclaw doctor prints. The planner/executor split is a first-class config concept: model.primary is the model that plans, and agents.defaults.subagents.model is the global default for the sub-agents that execute. Two override levels sit on top — per-agent at agents.list[].subagents.model, and per-call via the sessions_spawn model parameter — so a single deployment can run Fable 5 planning with Opus 4.8 execution as the default, and still pin one specific agent or one specific spawn to a different model.
Two Fable-5-specific behaviors from OpenClaw’s own Anthropic provider docs are worth knowing before you wire it in. First, OpenClaw omits custom temperature values on Fable 5 requests. Second, Fable 5 always uses adaptive thinking and defaults to high effort — /think off and /think minimal are remapped to low effort rather than disabling thinking, because Anthropic doesn’t allow thinking to be fully disabled on this model. Budget for reasoning tokens accordingly: the planner will think, whether or not you asked it to.
Cost routing extends past sub-agents. A heartbeat.model key routes OpenClaw’s periodic are-you-still-there checks — default every 30 minutes — to a cheap model, and a fallbacks[] array handles outage failover. One sourcing caveat we should be transparent about: the heartbeat and fallbacks key pattern comes from VelvetShark’s multi-model routing guide, published February 2026 — before Fable 5 existed — so take the key shapes from it, not the model names or prices, and substitute claude-fable-5, claude-opus-4-8, and current rates. New to the framework entirely? Start with our full OpenClaw setup and skills walkthrough.
model.primary and assuming sub-agents inherit something sensible. They inherit the primary — which means every executor tool call bills at Fable 5’s $10/$50 rate until agents.defaults.subagents.model says otherwise. Set the split explicitly, then verify with a spot-check of per-model token usage after the first day of running.05 — The Reference TableThe planner-brain config, side by side.
No single source puts the two frameworks’ planner/executor syntax in one place with Fable 5 as the specific model being routed — which is exactly what you need when you’re deciding where to run the pattern. The table below compiles the Hermes column from the Lushbinary setup guide and the OpenClaw column from OpenClaw’s official provider and sub-agent docs, with the heartbeat/fallbacks key pattern from VelvetShark’s routing guide, all retrieved July 2, 2026.
| Setting | Hermes Agent | OpenClaw |
|---|---|---|
| Model routing | ||
| Config file | ~/.hermes/config.yaml | ~/.openclaw/openclaw.json (legacy installs: ~/.clawdbot/clawdbot.json); edit via openclaw config edit |
| Fable 5 as planner | Set the primary model to claude-fable-5 — Hermes treats Anthropic as a first-class provider | Point model.primary at claude-fable-5 in the Anthropic provider block |
| Cheaper model for execution | No per-subagent model key is documented — routing is session-level, via the primary model plus fallback_provider | agents.defaults.subagents.model globally; per-agent via agents.list[].subagents.model; per-call via the sessions_spawn model parameter |
| Resilience and operations | ||
| Fallback on outage or cost | fallback_provider in config.yaml — e.g. Opus 4.8, or a local Ollama/vLLM model if the Anthropic API is unreachable | fallbacks[] array (config-key pattern; substitute current model IDs) |
| Background pings | No heartbeat concept documented in the sources we reviewed | heartbeat.model routes the periodic check (default every 30 minutes) to a cheap model |
| Scheduling and channels | /schedule for cron-style jobs; hermes gateway setup then hermes gateway install for Telegram/Discord/Slack | Not covered in the sources this guide draws on |
| Migration path | hermes claw migrate imports OpenClaw settings, memories, skills, and API keys in one command | n/a — OpenClaw is the migration source here |
| Security surface | ||
| Skills ecosystem | Self-written SKILL.md documents, compatible with the agentskills.io open standard | ClawHub marketplace — an early-2026 audit found 341 of 2,857 published skills malicious (~12%); vet before installing |
Read the third row twice — it’s the architectural fork. OpenClaw gives you a literal key that separates the planning model from the executing model inside one session. Hermes doesn’t, by design: its answer to “cheaper execution” is orchestration — spawn specialist agents and route between providers at the session boundary — plus fallback_provider for the cost and outage cases. If your primary goal is the tightest possible Fable-5-plans / cheap-model-executes split with minimal moving parts, OpenClaw’s config expresses it more directly. If you want the planner’s work to persist and compound across days, Hermes’s memory and skills layers are the stronger foundation.
06 — Cost MathWhat the split actually saves.
Anchor the decision in one worked example, at each model’s list price. A single agentic task consuming 200,000 input tokens and 50,000 output tokens costs about $4.50 on Fable 5 and about $2.25 on Opus 4.8 — the identical task, run twice, at $10/$50 versus $5/$25 per million tokens. Every derived cell below recomputes from those rates.
| Line item | Fable 5 ($10 / $50) | Opus 4.8 ($5 / $25) | How it’s computed |
|---|---|---|---|
| Input · 200K tokens | $2.00 | $1.00 | 0.2M × $10/M vs 0.2M × $5/M |
| Output · 50K tokens | $2.50 | $1.25 | 0.05M × $50/M vs 0.05M × $25/M |
| Task total, list price | $4.50 | $2.25 | Opus 4.8 runs the identical task at half the cost |
| Input on a full cache hit | ≈$0.20 | — | 90% off cached input: 0.2M × $1/M (shown for Fable 5, the metered model) |
| Task total with cached input | ≈$2.70 | — | ≈$0.20 cached input + $2.50 output |
One agentic task · three ways to pay for it
Source: Anthropic published rates via Lushbinary setup guide, July 2026 — recomputed per lineTwo levers stack on top of the routing split. Prompt caching is the first: Anthropic’s discount is 90% off cached input tokens, so in a long session that reuses a large system prompt or codebase, the input portion of that $4.50 task drops toward ~$0.20 on cache hits — $1 per million cached-input tokens against the $10 list rate. The general technique is covered in our prompt-caching engineering guide. The second lever is the pattern itself: in a planner-brain setup, Fable 5's share of tokens is the plan and the review pass, not the whole loop, so the $10/$50 rate applies to a small slice of the task while the executor’s cheaper rate covers the bulk. How much smaller that slice is depends entirely on your workload — we deliberately won’t invent a universal percentage.
Our forward read: as metering starts July 8, expect planner/executor splits to move from enthusiast trick to default posture in agent deployments, the way multi-tier model routing already did in production API stacks through 2025. The frameworks have made the config trivial; the remaining work is measurement — per-model token accounting per task, so you can see the split paying for itself. This is the kind of cost-and-architecture decision our AI transformation engagements exist to pressure-test before it hits your invoice.
07 — Routing GotchaTwo fallbacks, not one — don’t conflate them.
There are two distinct mechanisms that can hand your query to a different model, and debugging routing gets miserable if you mix them up. The first is yours: the config-level fallback_provider (Hermes) or fallbacks[] (OpenClaw) — user-controlled, for outages and cost, and it does exactly what you configured. The second is Anthropic’s: a safeguard classifier, applied server-side since the July 1 restoration, that reroutes a query to Opus 4.8 automatically when it lands in cybersecurity, biology, chemistry, or model-distillation territory. As the Lushbinary guide puts it: “For the vast majority of coding, automation, and research work the safeguard fallback never fires. If your agent operates near security or life-sciences topics, expect some responses to come from Opus 4.8, and budget for the fact that you may be paying the Fable 5 rate while receiving an Opus 4.8 answer on those specific turns.”
The classifier is automatic, outside your control, and fires on under 5% of sessions per Anthropic’s relaunch documentation. For most planner-brain deployments it’s a rounding error; for agents that touch security tooling — a pentest triage bot, a dependency CVE analyst — it’s a real line item and a real behavior change to test for. One more restoration footnote for compliance-minded teams: the mandatory 30-day data-retention requirement still applies to Fable 5 traffic — no zero-data-retention exemption for Mythos-class models — and our 30-day retention explainer covers what that means for enterprise agreements. Only the ZDR carve-out changed; this is not “everyone’s data is now retained.”
08 — SecurityHarden before you grant autonomy.
Here’s the connection the setup guides don’t make: wiring Fable 5 in as the planner means giving a frontier model shell access and letting it run unattended — on a runtime whose skill ecosystem was, months ago, measurably compromised. In early 2026 an independent audit of all 2,857 skills published on ClawHub, OpenClaw’s marketplace, found 341 confirmed malicious — roughly 12% — with about 335 traced to a single coordinated campaign tracked as ClawHavoc. Separately, CVE-2026-25253, rated CVSS 8.8, was a one-click remote-code-execution chain exploitable even against localhost-bound OpenClaw instances (patched in v2026.1.29), and scanning teams including Censys, Bitsight, and Hunt.io identified 30,000+ internet-exposed OpenClaw instances, many with no authentication.
The vendor response was real: on March 27, 2026, OpenClawd (the managed-hosting vendor) shipped automated skill vetting — static analysis plus behavioral testing before a skill activates — verified installer sourcing, and runtime sandboxing that auto-blocks skills flagged for network exfiltration, prompt injection, or credential exposure. Treat that as the floor, not the ceiling. Before your planner gets autonomy: run the latest patched version, don’t expose the instance to the internet, vet every skill as if the audit numbers were current, and isolate the runtime from your real credentials. The step-by-step version is our OpenClaw hardening guide — read it before the first unattended run, not after. To be fair to both sides of the table: these incidents are OpenClaw/ClawHub-specific; Hermes has its own agentskills.io skill ecosystem, which was not documented as compromised in the sources we reviewed.
"By default the agent runs as your user with access to your home directory, SSH keys, and any cloud credentials on the box. Prompt injection from a fetched web page or a file in the repo can turn a benign task into `rm -rf` or a key exfiltration attempt. Isolation is not optional for an autonomous agent."— Lushbinary editorial team, Claude Fable 5 + Hermes Agent Setup Guide
09 — ConclusionOne config key between you and a halved agent bill.
Put the expensive model where judgment lives, and meter everything else.
The pattern is small enough to ship this afternoon. In Hermes: primary model claude-fable-5, cost and outage routing via fallback_provider, and let the memory and skills layers turn each expensive plan into a reusable asset. In OpenClaw: model.primary for the brain, agents.defaults.subagents.model for the hands, and heartbeat.model so background pings never touch the metered model.
The math holds at any scale that matters: the same task at $4.50 on Fable 5 or $2.25 on Opus 4.8, with caching pulling the cached input share down by 90% — and the planner split shrinking the expensive model’s footprint to the plan and the review pass. With included usage ending July 7 and metering starting July 8, the teams that set this split now will barely notice the cliff; everyone else finds out from an invoice.
And the order of operations is non-negotiable: harden first, autonomy second. A marketplace that measured ~12% malicious, a CVSS 8.8 one-click RCE, and 30,000+ exposed instances are not reasons to skip the pattern — they’re reasons to treat isolation and skill vetting as step zero of it. The planner brain is only as trustworthy as the body you give it.