A Claude Code team-adoption audit is the difference between knowing your engineers have the tool installed and knowing what they actually do with it. Engineering leaders can pull a licence-usage report from Anthropic in a single click; what that report cannot tell you is whether the team has skills in .claude/skills, whether CLAUDE.md files are tight enough to be read in full on every turn, whether anyone has wired Stop hooks, and whether the security policy is keeping up with the shape of the tool today versus six months ago.
The 50-point scorecard below is the audit framework we use on client engagements. Five axes — configuration, hooks, skills, memory, security — at ten points each. Every point is checked against an artefact in the repo or in the user's local Claude Code state, not against survey self-reports. The output is a stage on a four-step maturity model and a 60-day uplift roadmap with the highest-ROI gaps prioritised.
This guide walks through what the audit looks for on each axis, how the maturity model maps an aggregate score to a development stage, and a worked example from a 30-engineer shop we audited in Q1 2026. Everything below is the operational method, not the sales pitch — the section at the end on engagement work is the place for the latter.
- 01Engagement is the trailing indicator that matters.Daily-active subagent usage and skill-invocation counts predict productivity gains; licence counts only predict spend. Audit the inputs that produce engagement, not the seats.
- 02Skills are the highest-ROI investment.One well-scoped skill saved per developer per week compounds into hours back inside a quarter. Most audited teams have zero shared skills — that is the lowest-hanging point.
- 03CLAUDE.md hygiene predicts adoption depth.Bloated memory files correlate with surface-level use because the model skims rather than reads. Tight, well-pruned memory correlates with the teams that have wired hooks and skills too.
- 04Security policy lags adoption by quarters.Teams adopt new tool surfaces (skills, MCP servers, hooks) faster than policy gets written. The audit catches the drift before an incident does.
- 05Quarterly re-audit cadence works.Monthly is too noisy — the artefacts only shift on the quarter scale. Annual misses the policy-drift window entirely. Quarterly is the rhythm that has held across the client base.
01 — Adoption vs EngagementInstalled is not adopted.
The first failure mode in any AI-coding-tool rollout is treating licence telemetry as the success metric. Anthropic's dashboard can confirm a seat is provisioned, a session opened, a token spent. None of those signals describe whether the engineer is using Claude Code as a thin chat interface or as a delegated agentic workflow — and the gap between those two postures is roughly an order of magnitude in productivity terms.
We separate the two with a deliberate vocabulary. Adoption is the floor: the tool is installed, the user has logged in, a session has been started this week. Engagement is the ceiling: subagents are being invoked, skills are being run, hooks are firing, CLAUDE.md is referenced on every turn, memory entries are accumulating with durable value. Adoption is binary and easy. Engagement is a spectrum and is what the audit measures.
Three signals separate engaged users from adopted-but-idle ones. They are also the signals managers most often miss because they sit outside the vendor dashboard:
Daily-active subagent count
Engaged teams invoke 2-5 subagents per active session. Adopted-but-idle teams invoke zero. Subagent count is the single strongest predictor of week-over-week productivity gains in the audited cohort.
.claude/agents/ · /agentsSkill invocations per week
Skills are user-defined slash commands. Teams that have written and use even three or four skills are visibly faster on routine tasks (commits, reviews, scaffolding). Teams with zero skills are still hand-prompting every interaction.
.claude/skills/Stop / SubagentStop hooks wired
Hooks fire on lifecycle events — Stop, SubagentStop, PreToolUse. They are how teams integrate Claude Code into their existing workflow (notifications, audit logs, automated reviews). Zero hooks usually means zero workflow integration.
settings.json hooksLicence telemetry sees none of the three. A user who opens a session daily and copy-pastes generated code into their editor scores identically to a user running parallel review subagents with hook-driven CI integration — yet they are doing fundamentally different work. The audit's purpose is to surface that gap, attribute it to the right axis, and convert it into a roadmap rather than a number.
For a deeper look at how subagents change the workflow specifically, our walkthrough on building a Claude Code custom subagent is the operational layer underneath this axis — the audit checks for the presence and quality of those agents, that piece teaches how to build them.
02 — Five AxesConfig, hooks, skills, memory, security.
The scorecard is organised around five axes, ten points each, for a fifty-point total. We picked these axes by working backwards from the engagement signals above — every axis maps to a class of artefact that engaged teams produce and idle teams do not. They are listed in the order we score them, because the earlier axes tend to be prerequisites for the later ones (skills are hard to land without good configuration; security is hard to audit without skills being present at all).
settings.json + permissions
settings.json hygiene, model defaults, permissions allowlist, MCP server registrations, environment variables, default tool gating. The base layer — every later axis assumes it.
10 pointsLifecycle integration
Stop / SubagentStop / PreToolUse / Notification hooks. Hooks turn Claude Code from a chat tool into a workflow component. Zero hooks is the single most common gap.
10 pointsSlash-command library
.claude/skills/ directory. Number of skills, scope quality, documentation, sharing across the team. The leverage axis — the one with the biggest productivity multiplier.
10 pointsCLAUDE.md hygiene
Length, structure, signal density, cross-references, freshness. Bloated CLAUDE.md correlates with surface-level engagement; tight memory correlates with deep agentic use.
10 pointsPolicy + productivity signals
Permissions discipline, secret handling, MCP-server review, audit logging, plus the productivity signals (engagement metrics, time-to-value, subagent uptake). The guardrails axis.
10 pointsTwo axes — configuration and hooks — form the base layer. They make the tool behave consistently across the team and create the hooks (literally) that later workflow integration depends on. Two more — skills and memory — form the leverage layer. They are what make Claude Code feel like a senior teammate rather than an autocomplete. The fifth — security — is the cross-cutting concern: it has to keep up with everything else because every new surface area is a new policy gap.
The next three sections cover each layer in turn — what we score, what good looks like, what failure looks like, and what the common remediation pattern is for the gaps we find most often.
03 — Config + HooksTwenty points on the base layer.
Configuration is the easiest axis to score because the artefacts are deterministic — either ~/.claude/settings.json has an entry or it does not. We look for ten things, weighted equally. A well-configured engineer scores eight or nine; an unconfigured one scores zero to two.
Project + user settings
.claude/settings.json + ~/.claude/settings.jsonSeparate files for repo-specific and user-global settings. Project settings committed to git so the team shares a baseline. User settings personal. Many teams have neither — every engineer runs out-of-box defaults.
2 points · structuralPermissions allowlist
settings.json permissionsExplicit allow / deny patterns for Bash, file paths, network. Teams without an allowlist either over-prompt for permission (friction) or run unrestricted (risk). Two points for explicit lists; one for partial.
2 points · safetyMCP server registry
.mcp.json or settings.json mcpShared MCP servers committed to the repo. Common picks: GitHub, Linear, Slack, project-specific data. Audit checks both presence and freshness — stale registrations are a red flag.
2 points · integrationModel + tool defaults
model, autoCompactEnabled, etc.Sensible defaults set at user or project scope. Sonnet for routine work, Opus for hard reasoning, Haiku for fast classification. Teams without defaults pay the configuration tax on every session.
2 points · ergonomicsStop + SubagentStop hooks
settings.json hooksLifecycle integration. Notifications, audit logging, automated summaries. Six points across the lifecycle events. Teams with zero hooks score zero across this block — the most common pattern in the audited cohort.
6 points · workflowPreToolUse / PostToolUse
fine-grained workflow gatesHooks firing on specific tool invocations. Used for things like vetting bash commands before execution, logging file edits, or running pre-commit-style checks. Advanced — usually only seen on the most engaged teams.
4 points · advancedThe most common base-layer gap is the no hooks profile: solid settings, sensible model defaults, sometimes even a tidy MCP registry — but no lifecycle hooks at all. That team is using Claude Code as a smart text editor rather than as an agentic component of their workflow. The remediation is small and high-ROI: ship a Stop hook that posts to Slack or appends to an audit log, watch engagement rise inside a week as the team starts thinking of Claude Code as a participant in the shared workflow rather than a private assistant.
Configuration gaps, by contrast, are usually a one-afternoon fix — generate a shared .claude/settings.json template, commit it, document the user-scope additions individuals are expected to make, and move on.
"Show me your hooks and I will tell you whether Claude Code is integrated into your workflow or sitting beside it."— Field note · Q1 2026 audit cohort
04 — Skills + MemoryTwenty points on the leverage layer.
If the base layer makes Claude Code behave consistently, the leverage layer is what turns it into a multiplier. Skills are the productised leverage — user-defined slash commands that wrap a repeated workflow into one keystroke. Memory is the sustained leverage — the file the model reads on every turn, which means tight memory pays dividends on every interaction.
Skills score ten points across five sub-criteria, two points each. Memory scores another ten with the same structure. The audit is opinionated about what good looks like on both — opinionated enough that scoring is reproducible across auditors, not interpretive.
Library size + quality
Number of skills shipped in .claude/skills/. Six to ten well-scoped skills is the sweet spot — covers the routine tasks (commits, reviews, scaffolding, doc updates) without overlap. Two points: library size; quality scored on the next four criteria.
2 ptsScope + docs + sharing + freshness
Each skill scoped to one job (not five). Each documented inside the skill file. Shared via repo so the whole team benefits. Updated as the underlying workflow evolves. Eight points total — most audited teams score zero here because the library is empty.
8 ptsCLAUDE.md size + structure
Lean is better. Most production CLAUDE.md files we recommend land at 200-400 lines. Bigger is a code smell — either pruning is overdue or content belongs in skills, docs, or per-subagent prompts instead. Structured with clear section headers the model can use as anchors.
4 ptsSignal density + cross-refs
Every section earns its line count. Cross-references point to authoritative artefacts (sub-docs, skills, agents) rather than restating them inline. Auto-generated memory files score badly here — they pad every section with boilerplate the model has to skim past.
4 ptsFreshness + drift
Memory updated within the last quarter. Stale CLAUDE.md tells the model things that are no longer true, which is worse than no memory at all. Two points: dated entries, recent edits, no obvious drift between memory and current repo state.
2 ptsThe leverage layer is where the audit usually finds the biggest point gap in absolute terms. A team scoring 14 out of 20 on the base layer is doing fine — there is room to improve but the tool is broadly functional. A team scoring 2 out of 20 on the leverage layer (the median for first-time audits in 2026) is leaving most of the value on the table. The remediation is straightforward — start with one skill per engineer per month, prune CLAUDE.md to 300 lines, re-measure the quarter after.
For the technique of building a single high-quality skill from scratch, our walkthrough on building a Claude skill from scratch covers the SKILL.md format, trigger phrasing, and the scoping discipline that separates a skill that gets invoked from one that sits idle in the directory.
05 — Security + SignalsTen points on the guardrails and the productivity signals.
The fifth axis is the smallest in nominal point count but the most cross-cutting. Six points cover security policy — the disciplines that have to keep up with every other axis as it grows. Four cover the productivity signals — the metrics that tell you whether the other forty points are paying off.
Permissions discipline
deny patterns auditedPermissions allowlist reviewed against current tool surface. New tools (Skills, MCP servers, hooks) audited as they are added. Forgotten allowlists drift into over-permission inside a quarter.
2 pointsSecret + PII handling
no secrets in CLAUDE.md / skills / hooksMemory files, skill files, and hook scripts checked for accidentally committed secrets or PII. Cross-reference with git history. The most common finding: api keys pasted into a skill for convenience.
2 pointsMCP server review
registered MCPs vettedEvery MCP server registered in the project verified — source, permissions, last-update date. Stale or unknown servers are a supply-chain risk. Two points for an active review process; zero for set-and-forget.
2 pointsEngagement metrics + cadence
weekly subagent + skill countsDaily-active subagent count, weekly skill invocations, time-to-value on new hires, audit cadence in place. Four points for the panel; without metrics the previous 46 points are unmeasured.
4 pointsSecurity drift is the audit's most-cited finding in retros. Teams adopt new tool surfaces — a new MCP server, a new hook, a new skill that runs bash — faster than the security policy catches up. The drift is rarely catastrophic but accumulates: by the time it shows up in a real incident the remediation is much bigger than the quarterly review would have been.
The productivity signals close the loop. Without them the other forty-six points are advisory rather than measurable. With them — even simple weekly subagent counts, skill-invocation totals, time-to-first-meaningful-output on new hires — the engagement curve is visible and the audit's next-quarter roadmap is grounded in evidence rather than assertion.
06 — MaturityFour-stage maturity model — Ad-Hoc → Optimised.
The aggregate score is plotted against a four-stage maturity model. The stages are deliberately broad — twelve-point buckets rather than fine-grained tiers — because the goal of the model is to set a next-quarter direction, not to fine-tune the current quarter. Most teams should expect to move one stage per quarter if the roadmap is taken seriously.
Stage 1 · Ad-Hoc
Licences in place, individual usage, no shared artefacts. CLAUDE.md is auto-generated. Zero skills, zero hooks, zero shared subagents. Adoption proven, engagement absent. The starting position for most first-audit clients.
Roadmap: ship the base layerStage 2 · Configured
Project + user settings.json in place, permissions allowlist, MCP registry. Maybe one or two hooks. Still no skills, CLAUDE.md still under-pruned. The team is taking the tool seriously but has not yet invested in leverage.
Roadmap: ship the leverage layerStage 3 · Leveraged
Skills library with six to ten entries, CLAUDE.md pruned and structured, hooks firing on lifecycle events, subagents in regular use. Security policy needs the most work. The most common audit-outcome stage.
Roadmap: ship the guardrailsStage 4 · Optimised
All five axes scoring strongly. Productivity signals tracked weekly. Quarterly re-audit cadence in place. The team is shipping new skills as workflows emerge and pruning CLAUDE.md as roles evolve. Rare — under 10% of audited teams reach this stage in 2026.
Roadmap: maintain + optimiseOne nuance worth stating. The stages are not strictly sequential — it is possible to land at Configured with a partial leverage layer or to skip from Ad-Hoc straight to Leveraged on the strength of a single motivated tech lead shipping skills before settings.json is even committed. The stage label is a summary, not a constraint. The roadmap a given team gets out of the audit is the concrete remediation list — the stage just sets the expectation for cadence and next-quarter ambition.
07 — Worked ExampleA 30-engineer shop, audited.
One representative engagement, anonymised. A 30-engineer product team, Next.js + Python stack, Claude Code rolled out seven months before the audit. Self-reported adoption was strong — every engineer had used Claude Code in the last fortnight, the eng-leadership team had a positive narrative internally. The audit told a more textured story.
Adoption scorecard · 30-engineer shop · pre-remediation
Anonymised client engagement, Q1 2026The headline finding: the team was Configured on paper but barely past Ad-Hoc in practice. Strong licence-usage telemetry, a passable base layer, almost no leverage layer, and zero infrastructure for measuring whether things were improving. The single highest-ROI remediation was the leverage axis — at 5 out of 20, the gap was the biggest absolute deficit on the scorecard.
The 60-day roadmap that came out of the audit had three workstreams. One: prune CLAUDE.md from 1,100 lines to roughly 350, split the rest into linked sub-docs, set up a quarterly edit cadence. Two: ship six initial skills — commit, code-review, scaffold-new-route, write-tests, update-changelog, spawn-subagent-team. Three: stand up a productivity panel — weekly subagent count, weekly skill invocations, time-to-first -meaningful-output for new hires. The security and hooks gaps were scheduled for the following quarter.
Re-audit at day 90 produced a score of 34 out of 50 — Stage 3 Leveraged, up from Stage 2 Configured, with the leverage axis now the strongest rather than the weakest. The productivity panel showed weekly subagent invocations roughly tripling and skill invocations climbing from a near-zero baseline. The absolute numbers are less interesting than the shape — the team moved from spending time prompting Claude Code to spending time orchestrating it.
If you want the same calibration applied to your team, our AI transformation engagements ship Claude Code adoption audits with a 60-day uplift roadmap scoped to your stack, your team size, and the gaps the scorecard identifies on the first pass.
Adoption depth is the difference between Claude Code as a tool and Claude Code as a platform.
The fifty-point scorecard is deliberately mundane on each individual point — a settings.json check, a CLAUDE.md line count, a skill directory inventory. What it produces in aggregate is the one thing licence telemetry cannot: a credible picture of how the tool is actually being used, which axis is the biggest drag on engagement, and where the highest-ROI remediation lives.
The next-quarter milestones we expect to see across the client base are visible in the scorecard distribution. Most teams move from Configured to Leveraged inside one quarter once the skills library and the memory prune ship — those two interventions together are the most consistent high-ROI step. The harder transition is from Leveraged to Optimised, where the gating factor is usually security policy and signal infrastructure rather than tooling — both of which require a quarterly cadence rather than a one-off project.
The signal underneath all of this: adoption depth is a property of the artefacts a team produces, not of the seats it buys. Audit the artefacts, build the leverage layer, ship the guardrails, and the productivity gains follow. Skip the audit and the seats sit idle while the dashboard says everything is fine.