AI DevelopmentPlaybook14 min readPublished July 2, 2026

Fable 5 plans and judges · cheaper models execute · one config surface per framework

Fable 5 as the Planner Brain in Hermes and OpenClaw

Claude Fable 5 is back for all customers as of July 1 — and it meters at $10/$50 per million tokens once the included window closes. The cost-sane setup is to run it as the planner and judge only, and route execution to Opus 4.8 or cheaper. Here is the exact config in Hermes Agent and OpenClaw, the worked cost math, and the marketplace hardening you owe yourself first.

DA
Digital Applied Team
Senior strategists · Published July 2, 2026
PublishedJuly 2, 2026
Read time14 min
SourcesVendor docs · Unit 42 · Reuters
Fable 5 list price
$10/$50
per Mtok · input / output
2× Opus 4.8
Worked task · 200K in / 50K out
$4.50
vs $2.25 on Opus 4.8
−50% routed
ClawHub audit
341/2,857
skills found malicious
~12%
Safeguard reroutes
<5%
of sessions, Anthropic-side

Running Claude Fable 5 as the planner brain — the model that plans and judges while cheaper models execute — is the single highest leverage config change you can make in Hermes Agent or OpenClaw this month. Fable 5 was restored for all customers on July 1, 2026 after its June 12 export-control suspension, and Anthropic has announced that included plan usage ends July 7, with metered usage credits at standard API rates from July 8.

At $10 per million input tokens and $50 per million output — twice Opus 4.8’s $5/$25 — leaving Fable 5 as the default model for every tool call in a 24/7 agent is the expensive way to run it. The cheap way is architectural: Fable 5 touches only the plan and the review pass, and the execution loop runs on Opus 4.8 or something cheaper still. Both major open-source agent runtimes have a documented config surface for exactly this split.

This playbook covers the two frameworks’ planner/executor architectures, the exact keys in Hermes Agent’s config.yaml and OpenClaw’s openclaw.json, a side-by-side config table nobody else has published, the worked cost math, and the security hardening that has to come before you hand a frontier planner shell access.

Key takeaways
  1. 01
    Run Fable 5 as planner and judge only.Fable 5 lists at $10/$50 per million tokens — twice Opus 4.8’s $5/$25. Route execution to cheaper models and the expensive model’s token footprint per task shrinks while the agent keeps running unattended.
  2. 02
    Each framework has one config surface for the split.Hermes Agent: the primary model plus fallback_provider in ~/.hermes/config.yaml. OpenClaw: agents.defaults.subagents.model in ~/.openclaw/openclaw.json, with per-agent and per-call overrides. Both are documented by the vendors.
  3. 03
    The worked math: $4.50 vs $2.25 per task.A 200K-input / 50K-output task costs ≈$4.50 on Fable 5 and ≈$2.25 on Opus 4.8 at list rates. Prompt caching takes 90% off cached input, dropping the input share of that task toward ~$0.20 on cache hits.
  4. 04
    Two fallbacks exist — keep them straight.Anthropic’s safeguard classifier reroutes sensitive queries to Opus 4.8 automatically, outside your control, on under 5% of sessions. Your config-level fallback_provider or fallbacks[] is a separate, user-controlled mechanism for outages and cost.
  5. 05
    Harden the skill marketplace before granting autonomy.An early-2026 audit of ClawHub found 341 of 2,857 published skills malicious (~12%), and CVE-2026-25253 (CVSS 8.8) was a one-click RCE against OpenClaw. Vet skills and isolate the runtime before the planner gets shell access.

01Why NowThe planner-brain pattern, and the July 8 reason to care.

Fable 5 shipped on June 9, 2026 as Anthropic’s flagship for long-horizon agentic work — planning across stages, delegating to sub-agents, running for hours while validating its own output. That job description is precisely why it belongs at the top of an agent stack rather than inside the execution loop: the skills that justify its price are planning, delegation, and judgment, not the ten thousand routine tool calls in between.

The economics turned urgent this week. After the June 12 export-control suspension, Fable 5 came back for all customers on July 1 — selectable again in Claude, the Anthropic API, and partner platforms including AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry — with pricing unchanged at $10/$50 per million tokens. Anthropic’s announced schedule gives subscribers an included window through July 7 (up to 50% of weekly plan limits), after which usage meters as credits at standard API rates. The plan-by-plan detail is in our Fable 5 usage-credits pricing guide. From July 8, every token your agent burns on Fable 5 is a token you pay list price for.

The planner-brain pattern is the standing answer. Fable 5 reads the task, writes the plan, and reviews the result; a cheaper model — Opus 4.8 at half the rate, or something smaller — does the fetching, editing, testing, and retrying. The agent’s capability ceiling stays where the frontier model sets it, while the metered bill tracks the cheap model’s rate for the bulk of the tokens.

The announced schedule
Fable 5 inclusion in paid plans runs through July 7, 2026. Metered usage credits at standard API rates begin July 8, per Anthropic’s redeployment announcements. How credits convert beyond “standard API rates” has not been published in detail — treat any specific conversion figure you see elsewhere as unconfirmed. What is confirmed: $10/$50 per million tokens, unchanged through the restoration.

02ArchitecturesTwo frameworks, two shapes of planner and executor.

Hermes Agent and OpenClaw are the two open-source runtimes where this pattern is most commonly wired up, and they model “planner plus executor” differently — which is why the config differs. MindStudio’s comparison frames it as agents built for modularity and composability versus agents built for simplicity and quick deployment. Hermes runs multi-agent orchestration: orchestrator agents spawn or call specialist agents and pass structured results between them. OpenClaw runs a single-agent plan-execute-reflect loop: it generates a task plan, executes steps, and reviews its own results before proceeding.

Multi-agent orchestration
Hermes Agent
Python · MIT license · NousResearch

Orchestrator agents spawn specialist agents and pass structured results between them. The planner split is session-level: set the primary model, and use fallback_provider for cost or outage routing. Reached #1 across AI applications on OpenRouter’s global rankings on May 10, 2026.

github.com/nousresearch/hermes-agent
Plan-execute-reflect loop
OpenClaw
TypeScript · created Nov 24, 2025

A single agent generates a task plan, executes steps, and reviews its own results before proceeding. The planner split is explicit config: model.primary for the brain, agents.defaults.subagents.model for the hands.

github.com/openclaw/openclaw

Both are large, actively maintained projects, not fringe tools — our July 2 GitHub snapshot showed roughly 207,600 stars on hermes-agent and 381,400 on openclaw (counts move daily; treat them as a snapshot, not a fixed claim). Momentum stories abound: NetworkChuck, a YouTuber with around five million subscribers, announced in May that after a month of use he was moving all of his OpenClaw agents to Hermes. Neither project is an Anthropic product, and Anthropic endorses neither — both are third-party runtimes that support the Anthropic API as one of several providers.

Which shape you want depends on the work. If your tasks decompose into specialist roles — a researcher, a coder, a reviewer — the Hermes orchestration model maps naturally. If your tasks are linear jobs with self-review, OpenClaw’s loop is simpler to reason about, and its subagents.model key gives you the cleanest literal expression of “the model that plans” versus “the model that executes” in any config file we’ve seen.

03Hermes SetupHermes: config.yaml routing and a memory that compounds.

Hermes Agent installs with a single command — curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash — which provisions uv, Python 3.11, Node.js, ripgrep, and ffmpeg, and stores all state under ~/.hermes/. Hermes works with any OpenAI-compatible endpoint and treats Anthropic as a first-class provider, so wiring Fable 5 in as the brain is a config edit, not a plugin hunt: set the primary model to claude-fable-5 in ~/.hermes/config.yaml.

The cost-aware routing lever is the fallback_provider key in the same file. It serves two jobs: resilience — a local Ollama or vLLM model as an offline fallback if the Anthropic API is unreachable — and cost control, for example Fable 5 primary with Opus 4.8 (claude-opus-4-8) as the fallback. Note what Hermes does not document: a per-subagent model key. The routing surface is session-level, which fits its orchestration architecture — you swap the whole session’s model rather than splitting one session across two models. If you’re coming from OpenClaw, hermes claw migrate imports settings, memories, skills, and API keys in one command.

What makes the planner investment compound is Hermes’s persistence. It keeps a four-layer memory: a curated MEMORY.md for environment facts and a USER.md for preferences, both loaded into the system prompt at session start, plus a SQLite archive with full-text search and a skills directory. And after a complex task — roughly five or more tool calls — Hermes writes a reusable SKILL.md document compatible with the agentskills.io open standard, so the plan Fable 5 derived once doesn’t get re-derived (and re-billed) next time. Automation entry points: /schedule for cron-style jobs, and hermes gateway setup then hermes gateway install for Telegram, Discord, or Slack. For the full walkthrough beyond planner routing, see our complete Hermes Agent setup guide and the latest Hermes desktop release notes.

Persistent memory
MEMORY.md → SQLite
4layers

MEMORY.md environment facts and USER.md preferences load into the system prompt at session start; a SQLite archive with full-text search and a skills directory sit underneath. All state lives in ~/.hermes/.

Loaded every session
Self-improvement
Tasks become SKILL.md
5+calls

After a complex task of roughly five or more tool calls, Hermes writes a reusable SKILL.md compatible with the agentskills.io standard — the planner’s approach is captured instead of re-derived.

agentskills.io standard
Traction
OpenRouter, May 10, 2026
#1

Hermes hit #1 across all AI applications on OpenRouter’s global rankings roughly 90 days after its February 2026 launch. Star counts (≈207,600 at our July 2 snapshot) move daily.

≈90 days post-launch
"Fable 5 supplies the reasoning. Hermes supplies the loop, memory, tools, and persistence. The combination is a self-improving agent that holds context across days of work and runs on the strongest publicly available coding model."— Lushbinary editorial team, Claude Fable 5 + Hermes Agent Setup Guide

04OpenClaw SetupOpenClaw: subagents.model is the whole trick.

OpenClaw’s config lives at ~/.openclaw/openclaw.json (or ~/.clawdbot/clawdbot.json for installs from the older Clawdbot npm package), edited via openclaw config edit or at the path openclaw doctor prints. The planner/executor split is a first-class config concept: model.primary is the model that plans, and agents.defaults.subagents.model is the global default for the sub-agents that execute. Two override levels sit on top — per-agent at agents.list[].subagents.model, and per-call via the sessions_spawn model parameter — so a single deployment can run Fable 5 planning with Opus 4.8 execution as the default, and still pin one specific agent or one specific spawn to a different model.

Two Fable-5-specific behaviors from OpenClaw’s own Anthropic provider docs are worth knowing before you wire it in. First, OpenClaw omits custom temperature values on Fable 5 requests. Second, Fable 5 always uses adaptive thinking and defaults to high effort — /think off and /think minimal are remapped to low effort rather than disabling thinking, because Anthropic doesn’t allow thinking to be fully disabled on this model. Budget for reasoning tokens accordingly: the planner will think, whether or not you asked it to.

Cost routing extends past sub-agents. A heartbeat.model key routes OpenClaw’s periodic are-you-still-there checks — default every 30 minutes — to a cheap model, and a fallbacks[] array handles outage failover. One sourcing caveat we should be transparent about: the heartbeat and fallbacks key pattern comes from VelvetShark’s multi-model routing guide, published February 2026 — before Fable 5 existed — so take the key shapes from it, not the model names or prices, and substitute claude-fable-5, claude-opus-4-8, and current rates. New to the framework entirely? Start with our full OpenClaw setup and skills walkthrough.

Config hygiene
The most common mistake in shipped OpenClaw configs is setting only model.primary and assuming sub-agents inherit something sensible. They inherit the primary — which means every executor tool call bills at Fable 5’s $10/$50 rate until agents.defaults.subagents.model says otherwise. Set the split explicitly, then verify with a spot-check of per-model token usage after the first day of running.

05The Reference TableThe planner-brain config, side by side.

No single source puts the two frameworks’ planner/executor syntax in one place with Fable 5 as the specific model being routed — which is exactly what you need when you’re deciding where to run the pattern. The table below compiles the Hermes column from the Lushbinary setup guide and the OpenClaw column from OpenClaw’s official provider and sub-agent docs, with the heartbeat/fallbacks key pattern from VelvetShark’s routing guide, all retrieved July 2, 2026.

Planner-brain configuration side by side — for each setting, the Hermes Agent syntax and the OpenClaw syntax for running Claude Fable 5 as the planner with cheaper models executing. Compiled from the Lushbinary setup guide, OpenClaw’s official docs, and VelvetShark’s routing guide, retrieved July 2, 2026.
SettingHermes AgentOpenClaw
Model routing
Config file~/.hermes/config.yaml~/.openclaw/openclaw.json (legacy installs: ~/.clawdbot/clawdbot.json); edit via openclaw config edit
Fable 5 as plannerSet the primary model to claude-fable-5 — Hermes treats Anthropic as a first-class providerPoint model.primary at claude-fable-5 in the Anthropic provider block
Cheaper model for executionNo per-subagent model key is documented — routing is session-level, via the primary model plus fallback_provideragents.defaults.subagents.model globally; per-agent via agents.list[].subagents.model; per-call via the sessions_spawn model parameter
Resilience and operations
Fallback on outage or costfallback_provider in config.yaml — e.g. Opus 4.8, or a local Ollama/vLLM model if the Anthropic API is unreachablefallbacks[] array (config-key pattern; substitute current model IDs)
Background pingsNo heartbeat concept documented in the sources we reviewedheartbeat.model routes the periodic check (default every 30 minutes) to a cheap model
Scheduling and channels/schedule for cron-style jobs; hermes gateway setup then hermes gateway install for Telegram/Discord/SlackNot covered in the sources this guide draws on
Migration pathhermes claw migrate imports OpenClaw settings, memories, skills, and API keys in one commandn/a — OpenClaw is the migration source here
Security surface
Skills ecosystemSelf-written SKILL.md documents, compatible with the agentskills.io open standardClawHub marketplace — an early-2026 audit found 341 of 2,857 published skills malicious (~12%); vet before installing

Read the third row twice — it’s the architectural fork. OpenClaw gives you a literal key that separates the planning model from the executing model inside one session. Hermes doesn’t, by design: its answer to “cheaper execution” is orchestration — spawn specialist agents and route between providers at the session boundary — plus fallback_provider for the cost and outage cases. If your primary goal is the tightest possible Fable-5-plans / cheap-model-executes split with minimal moving parts, OpenClaw’s config expresses it more directly. If you want the planner’s work to persist and compound across days, Hermes’s memory and skills layers are the stronger foundation.

06Cost MathWhat the split actually saves.

Anchor the decision in one worked example, at each model’s list price. A single agentic task consuming 200,000 input tokens and 50,000 output tokens costs about $4.50 on Fable 5 and about $2.25 on Opus 4.8 — the identical task, run twice, at $10/$50 versus $5/$25 per million tokens. Every derived cell below recomputes from those rates.

Worked cost example for one agentic task of 200,000 input and 50,000 output tokens — line items for Claude Fable 5 at $10/$50 per million tokens versus Claude Opus 4.8 at $5/$25, plus the prompt-caching variant at 90% off cached input. Rates from Anthropic’s published pricing via the Lushbinary setup guide, retrieved July 2, 2026.
Line itemFable 5 ($10 / $50)Opus 4.8 ($5 / $25)How it’s computed
Input · 200K tokens$2.00$1.000.2M × $10/M vs 0.2M × $5/M
Output · 50K tokens$2.50$1.250.05M × $50/M vs 0.05M × $25/M
Task total, list price$4.50$2.25Opus 4.8 runs the identical task at half the cost
Input on a full cache hit≈$0.2090% off cached input: 0.2M × $1/M (shown for Fable 5, the metered model)
Task total with cached input≈$2.70≈$0.20 cached input + $2.50 output

One agentic task · three ways to pay for it

Source: Anthropic published rates via Lushbinary setup guide, July 2026 — recomputed per line
Fable 5, everything200K in / 50K out · list price
$4.50
Fable 5 with cached input90% off cached input tokens
≈$2.70
Opus 4.8, same taskhalf the list rate on both sides
$2.25

Two levers stack on top of the routing split. Prompt caching is the first: Anthropic’s discount is 90% off cached input tokens, so in a long session that reuses a large system prompt or codebase, the input portion of that $4.50 task drops toward ~$0.20 on cache hits — $1 per million cached-input tokens against the $10 list rate. The general technique is covered in our prompt-caching engineering guide. The second lever is the pattern itself: in a planner-brain setup, Fable 5's share of tokens is the plan and the review pass, not the whole loop, so the $10/$50 rate applies to a small slice of the task while the executor’s cheaper rate covers the bulk. How much smaller that slice is depends entirely on your workload — we deliberately won’t invent a universal percentage.

Our forward read: as metering starts July 8, expect planner/executor splits to move from enthusiast trick to default posture in agent deployments, the way multi-tier model routing already did in production API stacks through 2025. The frameworks have made the config trivial; the remaining work is measurement — per-model token accounting per task, so you can see the split paying for itself. This is the kind of cost-and-architecture decision our AI transformation engagements exist to pressure-test before it hits your invoice.

07Routing GotchaTwo fallbacks, not one — don’t conflate them.

There are two distinct mechanisms that can hand your query to a different model, and debugging routing gets miserable if you mix them up. The first is yours: the config-level fallback_provider (Hermes) or fallbacks[] (OpenClaw) — user-controlled, for outages and cost, and it does exactly what you configured. The second is Anthropic’s: a safeguard classifier, applied server-side since the July 1 restoration, that reroutes a query to Opus 4.8 automatically when it lands in cybersecurity, biology, chemistry, or model-distillation territory. As the Lushbinary guide puts it: “For the vast majority of coding, automation, and research work the safeguard fallback never fires. If your agent operates near security or life-sciences topics, expect some responses to come from Opus 4.8, and budget for the fact that you may be paying the Fable 5 rate while receiving an Opus 4.8 answer on those specific turns.”

The classifier is automatic, outside your control, and fires on under 5% of sessions per Anthropic’s relaunch documentation. For most planner-brain deployments it’s a rounding error; for agents that touch security tooling — a pentest triage bot, a dependency CVE analyst — it’s a real line item and a real behavior change to test for. One more restoration footnote for compliance-minded teams: the mandatory 30-day data-retention requirement still applies to Fable 5 traffic — no zero-data-retention exemption for Mythos-class models — and our 30-day retention explainer covers what that means for enterprise agreements. Only the ZDR carve-out changed; this is not “everyone’s data is now retained.”

Keep them straight
Config fallback: you choose when it fires (outage, budget), and logs show the model you configured. Safeguard classifier: Anthropic chooses, per-query, in sensitive domains — and per the Lushbinary guide you may pay the Fable 5 rate on turns where an Opus 4.8 answer comes back. If your agent’s outputs suddenly read differently on security-adjacent tasks, check which mechanism moved before you touch your config.

08SecurityHarden before you grant autonomy.

Here’s the connection the setup guides don’t make: wiring Fable 5 in as the planner means giving a frontier model shell access and letting it run unattended — on a runtime whose skill ecosystem was, months ago, measurably compromised. In early 2026 an independent audit of all 2,857 skills published on ClawHub, OpenClaw’s marketplace, found 341 confirmed malicious — roughly 12% — with about 335 traced to a single coordinated campaign tracked as ClawHavoc. Separately, CVE-2026-25253, rated CVSS 8.8, was a one-click remote-code-execution chain exploitable even against localhost-bound OpenClaw instances (patched in v2026.1.29), and scanning teams including Censys, Bitsight, and Hunt.io identified 30,000+ internet-exposed OpenClaw instances, many with no authentication.

The vendor response was real: on March 27, 2026, OpenClawd (the managed-hosting vendor) shipped automated skill vetting — static analysis plus behavioral testing before a skill activates — verified installer sourcing, and runtime sandboxing that auto-blocks skills flagged for network exfiltration, prompt injection, or credential exposure. Treat that as the floor, not the ceiling. Before your planner gets autonomy: run the latest patched version, don’t expose the instance to the internet, vet every skill as if the audit numbers were current, and isolate the runtime from your real credentials. The step-by-step version is our OpenClaw hardening guide — read it before the first unattended run, not after. To be fair to both sides of the table: these incidents are OpenClaw/ClawHub-specific; Hermes has its own agentskills.io skill ecosystem, which was not documented as compromised in the sources we reviewed.

"By default the agent runs as your user with access to your home directory, SSH keys, and any cloud credentials on the box. Prompt injection from a fetched web page or a file in the repo can turn a benign task into `rm -rf` or a key exfiltration attempt. Isolation is not optional for an autonomous agent."— Lushbinary editorial team, Claude Fable 5 + Hermes Agent Setup Guide
Researcher corroboration
Unit 42, Palo Alto Networks’ threat-research arm, documented the ClawHub incident wave as an emerging AI supply-chain threat — malicious skills as the new malicious packages. The uncomfortable multiplier in a planner-brain setup is capability: the stronger the model you hand to a compromised skill, the more competently a hijacked session pursues the attacker’s goal. Marketplace hygiene isn’t adjacent to this playbook; it’s a prerequisite for it.

09ConclusionOne config key between you and a halved agent bill.

The planner-brain posture, July 2026

Put the expensive model where judgment lives, and meter everything else.

The pattern is small enough to ship this afternoon. In Hermes: primary model claude-fable-5, cost and outage routing via fallback_provider, and let the memory and skills layers turn each expensive plan into a reusable asset. In OpenClaw: model.primary for the brain, agents.defaults.subagents.model for the hands, and heartbeat.model so background pings never touch the metered model.

The math holds at any scale that matters: the same task at $4.50 on Fable 5 or $2.25 on Opus 4.8, with caching pulling the cached input share down by 90% — and the planner split shrinking the expensive model’s footprint to the plan and the review pass. With included usage ending July 7 and metering starting July 8, the teams that set this split now will barely notice the cliff; everyone else finds out from an invoice.

And the order of operations is non-negotiable: harden first, autonomy second. A marketplace that measured ~12% malicious, a CVSS 8.8 one-click RCE, and 30,000+ exposed instances are not reasons to skip the pattern — they’re reasons to treat isolation and skill vetting as step zero of it. The planner brain is only as trustworthy as the body you give it.

Deploy the planner-brain pattern in production

The strongest planner is only worth it when cheaper models do the running.

Our team designs, hardens, and cost-tunes agent deployments — planner/executor model routing, per-model token accounting, and security review — delivered in days, not quarters.

Free consultationExpert guidanceTailored solutions
What we work on

Agent-stack engagements

  • Planner/executor routing — Fable 5, Opus 4.8, cheaper tiers
  • Hermes and OpenClaw deployment and migration
  • Skill-marketplace vetting and runtime isolation
  • Prompt-caching and Batch API cost engineering
  • Per-model token accounting and spend dashboards
FAQ · Planner-brain setup

The questions we get every week.

It’s a model-routing architecture: Fable 5 handles only the steps where frontier judgment matters — reading the task, writing the plan, delegating, and reviewing the result — while a cheaper model such as Opus 4.8 executes the bulk of the tool calls. Fable 5 lists at $10 per million input tokens and $50 per million output, twice Opus 4.8’s $5/$25, so keeping it out of the execution loop shrinks the expensive model’s token footprint per task without lowering the agent’s capability ceiling. Both Hermes Agent and OpenClaw have documented config surfaces for the split, which is what this guide walks through key by key.
Related dispatches

Continue exploring agent stacks.