Claude Code adoption at a thirty-engineer product shop is the kind of rollout that either compounds or quietly stalls, and the difference is rarely about the tool. This case study walks through one engagement that landed cleanly — install on day zero, twenty-two shared skills by day ninety, a productivity panel the team built rather than inherited, and a 35% productivity lift that held into month four because weekly retros caught the friction before it became attrition.
The team in question is a Series-B SaaS product organisation, a single shared monorepo, mostly Node and TypeScript with a sliver of Go. Thirty engineers across six squads, a tech lead per squad, one staff engineer carrying the rollout, and a head of engineering who agreed to the cadence but did not drive it. The engagement spanned the calendar quarter that ended in late April 2026, with the post-90 review running at month four in mid-May.
What follows is the play-by-play. Situation, approach, skill kit design, hooks and memory discipline, the productivity panel, the outcomes that landed, and the lessons that generalise. The companion piece is the 30/60/90-day rollout plan — this case study is what happens when that plan meets a real engineering organisation with real constraints.
- 01Phased rollout beats big-bang.Three sprints of distinct character — install, leverage, governance — outperformed a compressed one-month push the same team attempted in 2025. The cadence is what made the artefacts compound rather than ship and decay.
- 02Skill kit is the highest-ROI investment.Twenty-two skills by day 90 with authorship spread across nine engineers produced the bulk of the productivity gain. The library is the single artefact whose growth most reliably tracked the engagement panel's depth metrics.
- 03CLAUDE.md hygiene predicts depth.Squads that pruned their CLAUDE.md to under 400 lines in sprint 1 shipped hooks and subagents in sprints 2 and 3. Squads that drifted past 800 lines stalled at install-and-skills with no further depth.
- 04Productivity panel must be engagement-weighted.The original panel tracked vanity counts and read flat for six weeks. Re-weighting around skill invocations and subagent runs surfaced the real adoption curve and changed how the leadership read the rollout.
- 05Weekly retros surface friction early.A thirty-minute Friday retro inside each squad caught permission gripes, skill-discovery problems, and hook noise inside the week they emerged rather than the quarter they accumulated. Without the retro, friction compounds silently.
01 — SituationA team with seats and no rollout to show for them.
The team had Claude Code seats for the better part of a year before this engagement started. Adoption was real but shallow — roughly half the engineers used the CLI weekly, mostly for individual workflows, with no shared skills, no committed settings, and no hooks. The CLAUDE.md files that existed had been written once and never revised; two of the six squads had no project memory at all. Licence telemetry looked healthy, which had let leadership believe the tool was landing. The day-zero audit told a different story.
That gap — between licence telemetry and actual capability — is the single most common situation we walk into. The pattern is unambiguous: a team installs Claude Code, individual engineers find it useful, leadership reports the rollout as a success on paper, and the artefacts that would indicate real team adoption never accumulate. The first conversation in this engagement was re-framing the problem from "are we using Claude Code" to "what does our team know how to do with Claude Code that it could not do six months ago" — and the honest answer in March 2026 was very little.
Three constraints shaped the engagement. The team could not pause product work to focus on the rollout — every milestone had to be absorbed into the normal sprint cadence. The staff engineer running the rollout had roughly 30% of their week, no more. And the head of engineering wanted a quantitative story by day ninety that would justify the time spent. The 30/60/90-day plan was adapted to those constraints rather than imposed on them.
"We bought the seats a year ago and pretended that was the rollout. The day-zero audit was uncomfortable. It was also the moment the project actually started."— Head of engineering, day-30 retro
The baseline 50-point audit on day zero scored a 19 out of 50. Install and licence (full marks), individual settings (partial), zero project-scope settings, three CLAUDE.md files of wildly varying quality, no committed skills, no hooks, no subagents. The team described that score as deflating on the day; by the day-30 retro it had become the most-cited artefact of the engagement, because it gave the rollout a quantitative spine that licence telemetry could not.
02 — ApproachFour phases, named owners, Friday retros.
The rollout was structured as three ninety-day sprints plus a sustain phase that started at day 90 and continued through the month-four review. Each sprint had a distinct character — install, leverage, governance — and the sustain phase carried forward the weekly retro and quarterly audit cadence. The four phases below are how the team referred to the work internally; the underlying plan is the 30/60/90 cadence with the post-90 sustain phase made explicit.
One named owner per phase. The staff engineer carried accountability across all four; the per-phase owners were the engineers most invested in the work for that sprint. Friday retros — thirty minutes per squad — caught the per-week friction. The cross-squad retro on the last Friday of each sprint surfaced the patterns worth lifting into the next phase.
Install & base layer
settings · CLAUDE.md · 3 skills · 1 hookProject-scope settings.json committed per repo, CLAUDE.md skeletons drafted under 400 lines, three starter skills written by the staff engineer and one tech lead, one Slack-notification Stop hook. Day-30 audit re-run captured the baseline shift.
Owner: staff engineer · Phase 1Leverage & skill library
library to 14 skills · 5 hooks · panel v1Skill captains named in each squad. Library grew from three to fourteen with authorship spread across seven engineers. Five hooks shipped covering notification, audit-log, and PreToolUse-gate patterns. Productivity panel v1 went live, badly weighted, and was rebuilt in week 8.
Owner: skill captains · Phase 2Governance & durability
3 subagents · permissions review · panel v2Three custom subagents shipped — code-reviewer, ticket-triager, release-notes-writer. Security partner reviewed permissions and MCP registry, signed off in writing. Productivity panel rebuilt to engagement-weighted signals. Day-90 audit, retro, and the calendared month-four review.
Owner: senior engineer + security · Phase 3Sustain & quarterly review
weekly retros · month-4 reviewFriday retros continued in each squad. Skill captains owned the library prune at month four. CLAUDE.md re-prune. Productivity panel signals reviewed against the day-90 baseline. The month-four number is what told the story to leadership and the board.
Owner: staff engineer · SustainThe choice to make sprint four explicit rather than implicit was the biggest deviation from the textbook 30/60/90 plan. Earlier engagements had treated days 91 onward as a fall-back-to-normal phase with a vague quarterly review; this engagement put the sustain phase on the same calendar as the active sprints, with the same owner and the same cadence, and the difference in month-four metrics was substantial. Adoption that holds is a property of the cadence the team commits to, not the install they completed.
Friday retros earned their place in week three. Two squads had been hitting the same permissions friction — the project settings allowlist was missing a tool that a recurring task needed, and engineers were silently working around it by invoking with broader permissions in personal sessions. The retro surfaced it in week three; the fix landed on the Monday; the working-around stopped that week. Without the retro the pattern would have stayed in the dark for the rest of the sprint, and the productivity panel would have shown a slowly widening gap between licence usage and skill invocations with no obvious cause.
03 — Skill KitTwenty-two skills, nine authors, scoped tight.
The skill library is the single artefact whose presence most reliably predicted productivity gains in this engagement and across the broader 2026 cohort. Twenty-two skills landed by day 90 with authorship spread across nine engineers — meaning the library was not a single-author project that decays if one engineer leaves. The captains in each squad enforced two disciplines: one job per skill, and documentation written before the skill is merged.
The skills clustered into four categories. Workflow skills wrapped recurring multi-step tasks (commit, code-review, scaffold-route, scaffold-component, run-migration). Quality skills enforced standards (lint-fix, test-gen, dep-audit). Knowledge skills surfaced team conventions (explain-this-pattern, find-owner, history-of-this-file). Operational skills moved artefacts between systems (release-notes, changelog-update, deploy-staging). The breakdown is illustrative rather than prescriptive — every team's mix is different — but the principle of distinct categories with explicit naming helped discovery once the library passed roughly ten entries.
Recurring multi-step tasks
Commit, code-review, scaffold-route, scaffold-component, run-migration, write-feature-flag, spawn-subagent-team, doc-update. The most-invoked category. Authorship spread across five engineers across three squads.
Most-invoked: commit · 247x/wkStandards enforcement
Lint-fix, test-gen, dep-audit, accessibility-check, type-safety-sweep. Quality skills had the lowest invocation count but the highest reported satisfaction — engineers felt the standards work was less drudgery than before.
Highest satisfaction · 9.1/10Convention discovery
Explain-this-pattern, find-owner, history-of-this-file, find-similar, why-this-decision. The skills new hires invoked most in their first two weeks; emerged organically from onboarding pain points.
New-hire favouriteCross-system orchestration
Release-notes-from-PRs, changelog-update, deploy-staging, incident-triage. Lower invocation count but very high value-per-invocation — typically saved 20-45 minutes per task. Owner: platform-squad captain.
Highest value-per-useOne skill in the workflow category — commit — accounted for roughly a third of all invocations by month four. That kind of skewed distribution is normal in mature libraries and not a sign of poor design; it tells the captains where to invest in iteration, and it tells leadership which workflow is actually being changed by the rollout. The companion piece on the 50-point adoption scorecard covers the scoring rubric for library breadth, depth, and invocation distribution.
The discipline that earned its place was the one-job-per-skill rule. Two skills in week six bloated into multi-step workflows that overlapped — a ship-feature skill that overlapped with commit, code-review, and deploy-staging. The captain in that squad decomposed it back into three smaller skills in week seven, and invocation counts rose afterwards because the smaller skills were easier to remember and reach for. Big skills look efficient on paper; small composable skills win in practice.
04 — Hooks + MemoryEleven hooks, six squads, tight CLAUDE.md files.
Hook design was the sprint-two milestone with the widest variance across squads. Three squads embraced hooks enthusiastically and shipped four each; two squads shipped one or two; one squad shipped none until sprint three. The squads that engaged tended to be the ones where the tech lead personally wrote the first hook in sprint one — once that existence proof landed, the rest of the squad found patterns to wrap. Where the tech lead waited for someone else to write the first hook, the wait usually outlasted the sprint.
The eleven hooks broke down across three archetypes. Five notification hooks (Slack pings on long-running tasks completing, audit summaries to the on-call channel). Four audit-log hooks (write-to-disk records of bash invocations, file writes outside the repo, MCP server tool calls). Two gate hooks (PreToolUse hooks that vetoed bash commands matching destructive patterns, and write operations to specific protected paths). The gate hooks were the most interesting in retrospect — they emerged in week six from a near-miss where an engineer almost shipped a destructive migration through Claude Code. The hook turned the near-miss into a documented policy in the same sprint.
Stay under 400 lines
Every squad started above 600 lines because engineers reached for completeness. The 400-line cap was a discipline, not a target. Tight memory in sprint 1 paid back through every interaction in sprints 2 and 3 — the squads that held the line had better skill-invocation rates and faster time-to-first-meaningful-output.
Hard cap, enforced in PRLoaded on relevance
Long-form content moved to .claude/docs/ as sub-docs with one-line summaries and links in the main file. Stack details, runbook references, convention deep-dives — all in linked sub-docs. The main CLAUDE.md stayed orientation-shaped: who, what, where to look.
Link, do not restateCalendared, not opportunistic
Month-four prune was on the calendar from day 30. Without the calendar entry the prune does not happen — engineers always have more urgent work. Treating the prune as a quarterly milestone makes it survive the quarter; treating it as a nice-to-have means it does not.
Calendar entry, day 0Notification · audit · gate
Three patterns covered every hook this team shipped. Notification hooks for visibility, audit hooks for compliance and after-the-fact debugging, gate hooks for risk reduction. Teaching the archetypes shortened the time from idea to shipped hook from days to hours.
Document the three, ship the fourThe CLAUDE.md hygiene rule had a measurable effect on engagement depth. The two squads that drifted past 800 lines in sprint two had visibly lower skill-invocation rates than the four that stayed under 400 — the panel showed a roughly 40% gap by the end of sprint two. The hypothesised mechanism, validated in two follow-up retros: bloated memory caused the model to skim the file rather than internalise it, which produced more generic responses, which made engineers reach for the tool less often, which lowered invocation rates further. The cycle is self-reinforcing in either direction.
The prune at month four cut the average CLAUDE.md line count from 520 to 340. Three sub-docs were created during the prune in the typical squad — usually one for stack details, one for conventions, one for a runbook reference — and the corresponding sections in the main file collapsed to one-line summaries with links. The prune itself took roughly an hour per squad. The productivity-panel response in the two weeks after the prune was a 12% lift in skill-invocation rate across the team, consistent with the hypothesis.
05 — Productivity PanelEngagement-weighted signals, not vanity counts.
The first version of the productivity panel was wrong, and that mistake taught the rollout more than any single milestone after it. Version one tracked the vanity metrics that licence telemetry already covered — total sessions opened per week, total tokens spent, average session length. The panel read flat for six weeks even as the underlying capability of the team was visibly shifting in retros and PRs. The signals it tracked were uncorrelated with adoption depth, and so it could not see the rollout that was happening.
Version two, rebuilt in week eight, weighted around engagement signals — skill invocations split by skill, subagent runs per week, time-to-first-meaningful-output for new contributors, audit-score deltas from the quarterly scorecard, and one team-specific signal (PR-to-merge time on Claude-Code-assisted PRs versus baseline). The pattern of shipped artefacts and retro reports suddenly tracked the panel rather than diverging from it. The shape below is what the panel looked like by month four, with the relative weight each signal carried in the engagement composite.
Productivity panel · engagement-weighted signals
Source: Engagement panel v2 · month-4 snapshotThe headline productivity number quoted to leadership and the board — a 35% productivity lift sustained at month four — was an engagement-weighted composite of the five signals above, not a single measurement. The composite is defensible because each component signal correlates with adoption depth in a different way: skill invocations measure breadth of tool use, subagent runs measure depth, time-to-first-meaningful-output measures onboarding effect, PR-to-merge time measures workflow impact, and audit-score delta measures the structural artefacts that make the others sustainable.
The 35% number was deliberately not framed as "engineers ship 35% more code." That framing invites scepticism because it is not what the panel measures, and it is not how the productivity lift actually compounds. The honest framing — used in the month-four memo to leadership — is that the team's measured engagement with Claude Code increased 35% from the panel-v2 baseline, and that this engagement increase corresponds to the shipped artefacts (the 22 skills, 11 hooks, 3 subagents), the audit-score delta, and the retro-reported satisfaction shift. That triangulation is what made the number stick rather than be debated.
06 — OutcomesWhat landed by day ninety and held at month four.
The outcomes that landed by day 90 fell into three buckets. Artefacts that the team built and now maintains. Metrics that quantify the capability shift. And cultural shifts that do not appear in either category but were the most-cited in the day-90 retro and the month-four review. All three held into month four with no measurable decay; the quarterly review cadence is what kept them in place.
The audit-score progression is the cleanest quantitative signal. The day-zero baseline of 19 out of 50 moved to 41 by day 90 — a 22-point shift, taking the team from late Stage 1 / early Stage 2 to mid-Stage 3 Leveraged on the scorecard rubric. Roughly ten percent of audited teams reach Stage 4 Optimised in their first quarter; this team did not, and the day-90 retro explicitly named the next quarter as a push from Stage 3 to Stage 4 through deeper subagent governance and a security cadence that catches drift before it accumulates.
Shipped & maintained
22 skills · 11 hooks · 3 subagentsLibrary spread across 9 authors. Hooks across 3 archetypes (notification, audit, gate). Subagents — code-reviewer, ticket-triager, release-notes-writer — each scoped to one job and documented with a SKILL-style README. All artefacts live in the monorepo and are reviewed in PR like any other code.
Committed to repo · day 90Quantified shift
audit 19 → 41 · productivity +35%50-point scorecard moved from 19 to 41 over the quarter (Stage 1 to mid-Stage 3). Engagement-weighted productivity composite +35% from panel-v2 baseline. Time-to-first-meaningful-output for new hires −45%. Day-90 satisfaction 8.4/10 anonymous.
Triangulated · month 4What does not appear in panels
retro themes · day-90 + month-4The most-cited shifts were qualitative. Engineers reached for the tool sooner in new tasks. Skill-writing became a recognised activity rather than a hobby. The Friday retro felt like a real meeting rather than a chore. Onboarding got noticeably calmer. None of these show up in a dashboard; all of them show up in retention.
Retro-reportedOne outcome worth singling out is the new-hire effect. Three engineers joined the team during the engagement; the third was hired in late sprint two and onboarded entirely against the new artefact layer. Their week-two time-to-first-meaningful-output was 45% faster than the baseline established by the previous five new hires under the old onboarding pattern. The skills they used most in the first fortnight — find-owner, explain-this-pattern, history-of-this-file — were specifically the knowledge-category skills that emerged from sprint-two retros as the artefacts most likely to compound the team's institutional knowledge into the onboarding flow.
The month-four hold is the harder result to attribute. By month four, plenty of three-month rollouts in the broader cohort have started to drift — skill libraries stagnate, CLAUDE.md files bloat again, hook noise rises. This team's month-four numbers were essentially flat against day-90 with a small upward shift on PR-to-merge time, which the team attributed to the new-hire effect compounding. The difference between holding and drifting was almost entirely the Friday retro cadence — the artefacts stayed live because the team kept talking about them.
07 — LessonsWhat replicates and what was specific.
The five lessons below generalise — they have shown up across several engagements in the 2026 cohort, not just this one. The team-specific details (the particular skills, the squad shapes, the monorepo structure) do not, and any other team replicating this case study should expect their library and hook mix to look meaningfully different. The cadence and the shape of the artefacts replicate; the contents adapt.
One non-obvious lesson: the staff engineer carrying the rollout was the right shape of role, but it could have been done by any senior IC or EM-grade engineer with the same time allocation. The role shape that matters is not the title but the combination of three things — enough seniority to enforce milestone discipline across squads, enough engineering depth to actually write skills and hooks rather than just steward them, and enough calendar discipline to keep the cadence intact when product work pushes back. Title is a proxy for these, not a substitute.
Name the rollout lead on day zero
Implicit ownership collapses to no ownership inside a fortnight. The staff engineer in this engagement had 30% of their week explicitly carved out. Less than that and the calendar slips. The role is the single point of accountability for the rollout landing, not a stewardship hat that someone wears over their normal work.
Single accountable ownerSkill captains, not skill culture
Two captains in the second sprint outperformed any amount of culture-of-sharing exhortation. The captains were not the most senior engineers; they were the engineers most personally invested in the tool. Name them explicitly and give them air-cover to write skills on company time.
Name 1-2 per squadBuild the panel around engagement
Vanity counts that licence telemetry already covers will read flat regardless of what is happening underneath. Engagement-weighted signals — skill invocations, subagent runs, time-to-output, audit deltas — track adoption depth. Build the panel around those and rebuild it when the early version is wrong.
Engagement compositeFriday retros are non-negotiable
Thirty minutes per squad per week is the cheapest part of the rollout and the most often skipped. The friction surfaced in week three of this engagement would have compounded silently for the rest of the sprint without the retro. Calendar them on day zero and do not let them slide.
30 min/squad/weekSustain phase is the rollout
The 30/60/90 plan describes how to land the artefacts; the sustain phase is what keeps them alive. Quarterly review cadence, library prune, CLAUDE.md prune, permissions audit. Without the sustain calendar, adoption drifts toward install-only inside two quarters regardless of how strong day 90 looked.
Calendar the quarterly reviewFor teams replicating this case study, the operational order matters. Run the day-zero audit before naming the rollout lead — the audit is what makes the case for the engagement internally, and the lead role is easier to fill once the score is visible. Name the skill captains in sprint two, not sprint one — they emerge from the work rather than being appointed. Build the productivity panel deliberately badly in sprint two and rebuild it in sprint three; the version-two rebuild is structural rather than incremental, and trying to ship the right panel on the first try usually delays the cadence rather than improving the signals.
The companion 50-point adoption scorecard is the right starting artefact for any team considering this replication. Run it on day zero, treat the score as the baseline, and re-run quarterly. The structural shape of the scorecard maps cleanly onto the four-phase plan above and gives leadership a defensible quantitative spine for the rollout that licence telemetry cannot provide. Our AI transformation engagements run this exact pattern with client engineering teams — the phasing, the audit, the panel, the cadence.
Claude Code adoption compounds with skills, hooks, and weekly retros.
The rollout described here is the textbook 30/60/90 plan adapted to a real engineering organisation with real constraints — limited rollout-lead time, no pause in product work, and leadership wanting a quantitative story by day ninety. The artefacts that landed (22 skills, 11 hooks, 3 subagents) compounded over the quarter because the cadence stayed intact, and the cadence stayed intact because the Friday retros surfaced friction inside the week it emerged rather than the quarter it accumulated.
The 35% productivity lift sustained at month four is the headline number, but the more durable signal is the audit score progression from 19 to 41 over the quarter. That twenty-two-point shift maps cleanly onto Stage 1 to mid-Stage 3 on the 50-point rubric, which is the realistic ceiling for a single-quarter rollout. The push from Stage 3 Leveraged to Stage 4 Optimised is a second-quarter project — deeper subagent governance, security cadence that catches drift, and a productivity panel that has been iterated against a full quarter of data.
The pattern that generalises beyond this engagement is the sequence of artefacts and the cadence around them, not the specific contents. Run the day-zero audit. Name the rollout lead and the skill captains. Build the engagement-weighted productivity panel. Hold the Friday retros. Calendar the quarterly review before day 90 closes. The contents of the skill library, the shape of the hooks, the design of the subagents will all differ by team. The cadence and the artefacts that hold the cadence are what consistently separate rollouts that compound from rollouts that decay.