A vibe-coding policy audit scores your team's AI-assisted coding governance across fifty points spanning review gates, accountability, intellectual-property exposure, secrets redaction, and approved tool catalogs. The audit exists because almost every engineering org now has Claude Code, Copilot, or Cursor running on production repos — and almost none have a written policy that survives a security review.
The gap matters because policy gaps become incident postmortems. When the first AI-authored regression ships, when a prompt leaks a customer record, when a junior engineer pastes a proprietary algorithm into a tool the company never sanctioned — the question stops being whether to write policy and becomes why the policy was written reactively, in the middle of a postmortem, by people who would rather be doing anything else.
This guide walks through the fifty points across five axes, the four-stage maturity model that scores each one, the common failure patterns we see in audits, and a worked example for a 30-engineer shop. The output of an audit is not a number — it's a prioritized punch list of policy gaps with severity, owner, and remediation pattern attached to each.
- 01Policy gaps become incident postmortems.Audit prevents reactive policy. Most teams write AI-coding policy in the week after their first AI-attributable incident, when the team is exhausted and the lessons are partial. Doing it during a calm quarter is meaningfully cheaper.
- 02Accountability for AI-authored code lives with the human author.Document it explicitly. The reviewer signs off on the diff regardless of who or what produced it. 'AI did it' is not a valid postmortem position. The policy needs to state this in writing or it becomes negotiable during the first incident.
- 03IP exposure is the under-discussed risk.Code-in-prompts moves data across boundaries. Every prompt sent to a cloud-hosted AI vendor is a data transfer. If the prompt includes proprietary algorithms, customer records, or regulated content, the transfer needs the same scrutiny as any other third-party data flow.
- 04Secrets redaction belongs in the IDE, not in policy alone.Tooling plus policy. A policy that says 'never paste secrets' is necessary but not sufficient — the policy needs to be backed by IDE-side redaction (pre-commit hooks, prompt-scrubbing extensions, scanners on the agent context) that makes the right thing the easy thing.
- 05Approved tool catalogs prevent shadow AI.Make the right path the easy path. When the sanctioned catalog is up to date, well-supported, and procurement-friendly, engineers use it. When it's stale or hostile, engineers shadow-install whatever works and the security team finds out from the audit log six months later.
01 — Policy vs VibeAI-assisted coding needs a contract — most teams have none.
"Vibe coding" — the practice of leaning heavily on an AI assistant to author, refactor, or review code with relatively little hand-typed input — has gone from edge case to default workflow in roughly eighteen months. The tooling improved faster than the governance did. Most engineering orgs we audit have Claude Code, Copilot, or Cursor running across the team with no written policy describing what's allowed, who is accountable, what data may be sent to the vendor, or which tools are approved for production use.
The absence of policy is not a neutral state. It means the decisions are still being made — by individual engineers, in the moment, based on vibe rather than written contract. Some of those decisions are fine. Some are catastrophic. The audit exists to identify which axes are operating on vibe and to convert each one into an explicit, written, enforced rule before an incident does it for you.
The five axes below are the ones that consistently surface in postmortems. They are not the only axes that matter — model evaluation, deterministic-output requirements, and observability matter too — but these are the five where the absence of policy maps most directly to recoverable customer harm. Score each axis on a four-stage maturity model: absent (0), ad-hoc (1), documented (2), enforced (3). Ten points per axis times three points each gives a possible 150. Anything below 75 means the policy is reactive — the next incident writes the rules.
Four-stage maturity model · scored per point
Source: Digital Applied audit framework, internal field data 2025-202602 — Review GatesTen checks on what humans review.
Review gates are the most-audited axis because they map most directly to known software practices — branch protection, required reviewers, CI checks. The AI-specific overlay asks a sharper question: when an AI authored the diff, does the review change? In most teams the answer is no, and that's usually the wrong answer. AI-authored diffs have different failure modes than human-authored diffs (confident hallucinations, plausible-but-wrong type signatures, subtle API misuse) and the review should weight those modes more heavily.
The ten review-gate points cover branch protection, required human review on AI-authored diffs, scope discipline (one concern per PR even when an agent generated the patch), test coverage requirements specific to AI-authored code, CI gates, structured review checklists, escalation paths for high-risk changes, retroactive sampling of merged AI-authored code, and review-time disclosure (did the author use AI?). The grid below summarizes the four highest-leverage points.
Is AI authorship disclosed at PR time?
Reviewers benefit from knowing the diff was AI-generated — it shifts what they scrutinize. A single PR template field ('AI tools used: none / Copilot / Claude Code / other') is the cheapest enforcement. Score 3 when the field is required and reviewed.
PR template fieldRequired tests for AI-authored code
Policy can require that AI-authored code includes tests for the changed paths even when human-authored code does not. The asymmetry is intentional — AI is faster at writing tests than humans are at noticing missing ones.
Asymmetric coverageOne concern per PR — even with agents
Agents are willing to refactor ten files when asked to fix one bug. Policy should require AI-authored PRs to stay scoped to the originating task and reject sprawl. Reviewers enforce this; it's also a CI check via file-count thresholds.
Sprawl gateRetroactive review of merged AI code
Sample 5-10% of merged AI-authored PRs each quarter for retroactive deep review. Surfaces patterns no individual PR review would catch — accumulating tech debt, inconsistent style, security smells that became normal because nobody flagged the first instance.
Quarterly sampleThe pattern we see most often: review gates score 2 or 3 on the human-authored axis (branch protection, required reviewers) and 0 or 1 on the AI-specific overlay. That gap is the one to close first because it's the cheapest — most of the infrastructure already exists, the policy work is mostly about layering AI-specific checks onto gates the team already respects.
03 — AccountabilityTen checks on who owns AI-authored code.
Accountability is the axis that breaks under stress. In the calm before an incident, every team agrees that the human author owns the code regardless of who or what produced it. In the postmortem after a bad incident, that agreement gets renegotiated in real time, and the renegotiation goes badly unless the rule was written down and signed off in advance. The accountability points exist to make the rule non-negotiable before it's tested.
The ten points cover author-of-record assignment, reviewer accountability for AI-authored diffs, the rule that "AI did it" is not a postmortem position, incident-response playbooks that name AI tools as part of the root-cause taxonomy, on-call escalation paths when an AI tool is implicated, license and IP indemnity language in contracts, training requirements before staff get tool access, and the documented chain of responsibility from keystroke to merge.
Human author owns the diff
The engineer who pressed merge is accountable for the code regardless of who or what produced it. AI assistance does not transfer accountability — it changes the workflow but not the ownership. Write this rule. Sign-off required.
Default ruleReviewers sign off on AI diffs
Approving a PR is approving the code. Reviewers cannot exempt themselves on grounds the code was AI-authored. The policy needs to state that review is the same gate regardless of authorship, otherwise approval becomes a procedural rubber-stamp on agent output.
No exemption"AI did it" is not a valid root cause
The root cause is the human decision that allowed the AI output to ship. Was the review insufficient? Was the test gate missing? Was the agent given too much scope? Postmortems that stop at "the model hallucinated" produce no preventive action and erode trust in the tooling.
Decision-based root causeTool access requires onboarding
Hands-on enablement before tool access. Cover the team policy, the review gates, the escalation paths, the redaction tooling. A 30-minute session plus a quiz is plenty. Skip it and your weakest user becomes your incident surface.
Gated access"The human author owns the diff regardless of who or what produced it. AI assistance changes the workflow, not the accountability."— Accountability rule · Digital Applied policy template
Two operational notes worth surfacing. First, the policy needs to explicitly cover hand-off cases: when an engineer asks an agent to refactor a colleague's code, who owns the resulting diff at merge? The default is the engineer who initiated the agent run, but it needs to be written down. Second, the policy needs to cover external contractors. If you allow contractors to use AI tooling, the contract needs to extend the accountability rules to them in writing. Otherwise you have a policy gap exactly where the trust boundary is weakest.
04 — IP + RedactionTen checks on intellectual-property exposure.
Every prompt sent to a cloud-hosted AI vendor is a data transfer. That sentence should be the opening line of the IP section in your policy. Once it's framed as a transfer, the standard third-party data-flow controls apply: what categories of data may cross the boundary, what categories may not, what the contractual protections are on the receiving end, what the audit trail looks like, and what the incident-response procedure is when the rule is broken.
The ten IP and redaction points cover categories of data forbidden from prompts (customer PII, regulated content, unfiled patent material, security-sensitive code), categories permitted with vendor controls, vendor data-retention and training-opt-out posture, contractual indemnity for AI-generated output, license review on training-set adjacency, and the redaction tooling that makes the right thing the easy thing.
Never permitted in prompts
Customer PII · Patient records · Production secretsHard-stop category. No prompt may contain customer-identifying information, regulated health or financial records, or live production credentials. This is enforced by IDE-side scrubbing plus policy. Score 3 only when both layers exist.
Hard stop · tooling enforcedPermitted with controls
Internal code · Architecture · Test fixturesPermitted on approved tools with vendor data-retention and training-opt-out configured. The audit trail shows which tools meet the bar and which categories of data each tool is approved for.
Vendor-controlled tierFreely permitted
Public docs · Stack-Overflow-equivalent · OSSAnything already public or that would be acceptable to paste into Stack Overflow. The bulk of vibe-coding traffic. The policy should explicitly permit this tier so the tighter rules on A and B don't get ignored as overreach.
Default-permit tierThe redaction question is where policy meets engineering. A policy that forbids customer PII in prompts is necessary but not sufficient — the IDE needs to scrub the data before the prompt leaves the boundary. Pre-commit hooks that scan the agent context, prompt-scrubbing extensions, and outbound proxy rules are the standard primitives. Score 3 on this point only when the tooling layer exists alongside the written rule.
License-adjacency is the under-discussed sub-axis. Some AI coding tools were trained on permissively-licensed code; some were trained on a broader corpus. Your policy should describe which vendors are approved for code generation in license-sensitive contexts and what review happens before AI-generated code is committed to a repo with strict license requirements. If you ship under a permissive license but integrate copyleft code by accident, the cleanup is expensive.
05 — SecretsTen checks on prompts and context.
Secrets are the simplest axis to write policy for and the hardest to enforce. Every engineering team agrees that production credentials should never appear in a prompt; the failure modes are about how easily they slip in anyway. Environment variables loaded by mistake, .env files opened in the editor while an agent reads the workspace, log files with embedded tokens, screenshots pasted as context. The policy and the tooling have to cover all of those paths.
The ten secrets points cover the written prohibition on secrets-in-prompts, IDE-side scrubbing, pre-commit hooks that block agent context containing high-entropy strings, workspace-level rules that exclude .env from agent indexing, screen-share / pair-programming guidance, rotated-credentials policy after a suspected exposure, and the response procedure when a leak is detected.
Secrets-protection maturity · tiered enforcement model
Source: Field-audit maturity tiers, Digital Applied 2025-202606 — Approved ToolsTen checks on catalog and procurement.
The approved-tools axis is where policy meets procurement. If the sanctioned catalog is up to date, well-supported, and procurement-friendly, engineers use it. If the catalog is stale, hostile to add new entries, or missing the tools engineers actually want, shadow AI starts immediately and the security team finds out from the audit log six months later. Make the right path the easy path or accept that the policy is decorative.
The ten approved-tools points cover the catalog itself (who owns it, where it lives, how it's versioned), the procurement path for adding a new tool, the security review applied before a tool joins the catalog, the per-tool data classification (which IP tier the tool is approved for), offboarding when a tool exits the catalog, shadow-AI detection via egress logging, vendor risk re-assessment cadence, and the explicit list of tools forbidden for production use.
The catalog exists and is up to date
Single source of truth for which AI coding tools are approved. Lives where staff already look (engineering wiki, not a buried Confluence page). Reviewed monthly. Stale catalogs are worse than no catalog — they erode trust in the whole policy.
Monthly reviewProcurement path for new tools
When an engineer wants a new tool, there is a written path from request to approved-or-rejected in under two weeks. Long paths produce shadow installs. Short paths produce policy adherence. The procurement SLA is itself a policy artifact.
Time-bound SLATool-to-IP-tier mapping
Each approved tool is mapped to the IP tiers it's approved for. Tier A (PII / regulated) usually means on-prem or air-gapped only. Tier B (internal code) means vendor-controlled with training-opt-out. Tier C (public-equivalent) is broad. Make the map explicit.
Data-classification gridShadow-AI detection via egress logging
Outbound network egress to known AI vendor domains is logged and reviewed. Detects shadow installs of tools not in the catalog. The detection is the enforcement — staff know egress is reviewed, which keeps the catalog conversation honest.
Detection layerOne operational note. The catalog should explicitly list the tools forbidden for production use, not just the tools approved. Engineers reading the policy need to see that the forbidden list is curated and current — otherwise the default interpretation is "anything not approved is forbidden," which is technically correct and operationally hostile. List both, refresh both monthly, and the policy lands better.
For context on how to think about AI tool adoption at the team level — the broader audit pattern that complements this policy work — our Claude Code team adoption audit scores the adoption side of the same coin. And once policy and adoption are in place, the next leverage is building the custom subagents that encode your review gates and accountability rules directly into the agentic workflow.
07 — Worked ExampleA 30-engineer shop, audited.
The shop: a 30-engineer Series B with three product squads, shipping a B2B SaaS product. Heavy Claude Code adoption, scattered Copilot use, two engineers running Cursor for personal preference, no written AI policy. The audit ran roughly three hours including stakeholder interviews and tool-usage sampling.
The headline result: 64 out of 150. Two axes scored above the half-mark (review gates and secrets, both around 18 out of 30) because the team had inherited strong general engineering hygiene. Three axes scored below half (accountability at 8, IP at 10, approved tools at 11) because none of them had been touched by anyone with AI-specific lens.
Worked example · 30-engineer shop · per-axis scores
Source: Anonymized field audit, Digital Applied 2026The remediation plan that came out of the audit was a six-week rollout in three phases. Week one and two: write the accountability rules and the data-tier framework, get engineering and legal sign-off. Week three and four: stand up the approved-tools catalog with the existing tools in it, add the procurement SLA, configure vendor opt-out on every approved tool. Week five and six: add the AI-specific review gates (PR template field, AI-attribution disclosure, required-tests-on-AI-authored-code), configure IDE-side redaction, ship the training session for tool access.
The re-audit at week eight came in at 118 out of 150 — well above the 75 threshold for "policy is no longer reactive." None of the work was conceptually hard. The leverage was entirely in doing it during a calm quarter rather than during the postmortem after the first incident. If you want this run on your team, our AI transformation engagements include the policy-audit deliverable plus the templates and tooling to remediate the gaps surfaced.
Coherent AI-coding policy is the cheapest insurance an engineering org can buy.
The audit costs roughly three hours and produces a punch list. The remediation is six weeks of work that almost entirely consists of writing things down, configuring existing tooling, and running a training session. The alternative is writing the same policy during the postmortem after your first AI-attributable incident, with half the team exhausted and the lessons partial.
The five axes — review gates, accountability, IP and redaction, secrets, approved tools — are not the only axes that matter for AI governance. Model evaluation, deterministic-output requirements, observability, and agent-permission scoping all earn their own audit checklists. The five above are the ones where the absence of policy maps most directly to recoverable customer harm, which is why they're the starting point and not the finish line.
Practical next step: schedule a calm afternoon, walk the fifty points with the engineering lead and one security-adjacent stakeholder, score each one honestly. Take the lowest-scoring axis and write its rules first. Within a month you'll have a policy that survives a security review and an incident postmortem — and a team that knows the rules well enough to follow them without being asked.