Agentic AI SOC2 controls mapping is the discipline of taking the five Trust Services Criteria — security, availability, processing integrity, confidentiality, and privacy — and translating each of the sixty underlying controls into something that an agent, a retrieval system, a tool-calling layer, and a model-update process can actually enforce. Done well, the same audit that already covers your platform absorbs the AI surface area. Done badly, the audit either ignores the agents (so the program is fictitiously compliant) or stalls at evidence collection because nobody instrumented the right artefacts when the system was built.
The temptation when SOC2 first meets agentic AI is to invent a new compliance regime — an "AI controls framework" that sits beside SOC2 and produces its own evidence stream. Resist that. SOC2 is a mature framework with audit-trained reviewers, a Type II observation discipline that already matches the operating cadence agents need, and a control taxonomy that maps cleanly to almost every agent concern. The work is mapping, not invention; the artefact is a translation table, not a parallel regime.
This framework walks the five TSC categories in order, names the controls that need explicit agentic-AI translation, prescribes the evidence the auditor expects, and lays out the quarterly cadence that keeps a Type II window populated without a heroic month-before-audit scramble. It is written for the team that has to be audit-ready in six months, not the team writing a policy white paper.
This guide covers SOC2 Type II — the observation-window audit that proves controls operated effectively over time. Type I (point-in-time design) is a stepping-stone; the agentic AI mapping work pays off in Type II because that is where evidence-cadence discipline shows. The companion piece on governance templates pairs naturally with this one: Stage 8 governance kit provides the operating loop; this framework provides the audit translation.
- 01SOC2 maps cleanly to agentic AI — do the mapping, not a rewrite.The Trust Services Criteria were written framework-agnostic, and almost every control has a clean agentic-AI translation. The temptation to invent a parallel AI compliance regime produces evidence sprawl and audit confusion. The right move is a translation table that names the agentic-AI behaviour each existing control covers and the evidence artefact that proves it.
- 02Evidence-first design beats post-hoc collection by an order of magnitude.Teams that wire evidence collection into the system at build time clear Type II audits with weeks of preparation. Teams that try to reconstruct evidence at audit time spend months hunting for logs that were never retained, eval runs that were never archived, and access records that rotated out of the retention window. Build the evidence pipeline alongside the agent — not afterwards.
- 03CC6 (logical access) is the highest-friction mapping for agentic systems.Agents act on behalf of users, services act on behalf of agents, and tools act on behalf of services. The chain of delegated authority does not map neatly onto SOC2's user-centric access-control vocabulary. The mapping has to name the agent identity, the tool-call boundary, the credential scope, and the audit-trail discipline that makes the chain reviewable.
- 04Processing integrity needs eval coverage, not just unit tests.Traditional SOC2 processing-integrity controls assume deterministic transformations. Agentic systems are probabilistic by design. The mapping replaces unit-test coverage with eval coverage on representative inputs — task-specific evals, safety evals, bias probes — run on every model swap and archived for the observation window.
- 05Quarterly cadence matches Type II observation windows naturally.SOC2 Type II observes over a window — typically six or twelve months. The Stage 8 governance cadence (weekly health, monthly committee, quarterly framework review) feeds the observation window with already-archived evidence. Teams that run the cadence walk into the audit with the evidence file mostly built; teams that do not have to backfill, which is where audits stall.
01 — Why MappingAgentic AI is novel; SOC2 is not — bridge them deliberately.
The instinct to write a new compliance framework for agentic AI is understandable. The systems are genuinely novel — non-deterministic outputs, tool chains that act on behalf of users, model versions that ship every few weeks, retrieval layers that change the effective behaviour of the system without a code change. SOC2 was authored for an era where software changed quarterly, data flowed through deterministic pipelines, and access control was a user-to-resource decision. None of those assumptions hold cleanly for an agent.
And yet the answer is mapping, not invention. The Trust Services Criteria themselves are written at a level of abstraction that survives the translation: "the entity restricts logical access to information assets" (CC6.1) is just as true a requirement when the entity is an agent as when it is a human user. What changes is the implementation — and the evidence. The framework here is the explicit translation between the existing control language and the agentic-AI implementation that satisfies it.
The cost of inventing a parallel regime is real. Auditors trained on SOC2 have to be re-trained; evidence streams have to be duplicated; the governance committee has to manage two control registers; and customers asking for an attestation get a bespoke document that requires their security team to evaluate from scratch. The cost of mapping is small and one-time. The cost of invention compounds quarterly.
Mapping maturity · four tiers · the gap is evidence automation, not control authorship
Source: Digital Applied audit-readiness tiers, 2026 field engagements02 — Security (CC)Logical access, change management, monitoring.
The Common Criteria (CC) — the security category — carries the majority of the sixty controls and the majority of the agentic-AI translation work. Five control families inside CC absorb the bulk of the mapping effort: CC6 (logical access), CC7 (system operations and change management), CC8 (change management for production code), CC2 (communication with stakeholders about controls), and CC4 (monitoring of controls). Each one has a tacit assumption that needs explicit translation when the system in scope is an agent rather than a deterministic service.
CC6 is the highest-friction family. Logical access in a traditional SaaS model is a user-to-resource decision: a named human authenticates, a policy evaluates, the resource grants or denies. In an agentic system the chain is longer — a user triggers an agent, the agent assumes a service identity, the service calls a tool, the tool acts on a resource. Each link in the chain has its own identity, scope, and audit trail, and the SOC2 auditor reasonably expects every link to be reviewable.
CC7 and CC8 cover change management, which is the family where agentic AI most visibly diverges from traditional SaaS. Model versions ship every few weeks, not every quarter; prompt changes modify behaviour without a code change; retrieval-index updates shift the effective system behaviour silently. The control translation has to name model swaps, prompt revisions, eval-set updates, and retrieval-index versioning as change-management events with the same gating discipline that production code already enjoys.
Agent RBAC with delegated authority
Every agent has its own identity (not a shared service account); every tool call carries a scoped credential; the user-to-agent-to-tool chain is auditable end-to-end. The evidence artefact is a per-request audit log showing the user, the agent identity, the tool invoked, and the credential scope. Auditors test by sampling a request and walking the chain backward.
Per-agent identityModel swaps as change events
Model identifiers, prompt templates, eval sets, and retrieval indexes are version-controlled and treated as change events with explicit approval gates. The evidence artefact is the model-update review log (see Stage 8 governance kit) showing the eval gate, canary gate, rollback gate, and communication gate all cleared.
Versioned model surfacePrompts in code review, not in chat
Prompt revisions go through the same code-review process as production code. Eval-set additions go through review. Retrieval-index updates have a documented approval. The evidence artefact is the change-management ticket trail showing reviewer, eval impact, and rollback plan for each change. Slack-only prompt edits are the canonical anti-pattern.
Prompts-as-code disciplineCustomer-facing AI disclosure
Customers and stakeholders are informed of material AI capabilities in their service surface — what is automated, what is human-reviewed, what data flows into model providers. The evidence artefact is the customer-facing AI notice plus the internal record showing it was reviewed by legal and product before publication.
Explicit AI disclosureEval drift as a control signal
Eval pass rate, latency, cost, and safety-eval signals are continuously monitored and alert when thresholds are crossed. The evidence artefact is the monitoring dashboard archive plus the alert-response log. Auditors test by sampling alerts and walking the response — who saw it, what they did, how it resolved.
Continuous eval monitoringOne subtlety on CC6 worth pulling out: the agent identity should be distinct from the service identity, and both should be distinct from the human user identity at the top of the chain. Collapsing the chain into a single "ai-service-account" credential is the canonical CC6 failure mode in agentic systems — it makes the audit trail unreviewable and makes scoped revocation impossible. The right model is one identity per agent persona, with separate credentials per tool the agent can call.
On CC7 and CC8, the asymmetry between forward gates and rollback authority matters as much in the compliance frame as it does in the governance frame. Auditors are comfortable with broad rollback authority as long as the post-fact ratification is documented; what auditors reject is forward changes that bypass the review process. The mapping has to make clear which gates are blocking-forward and which roles can pull rollback without committee re-approval.
03 — AvailabilitySLAs, capacity, incident response.
The Availability category looks deceptively similar to its traditional-SaaS equivalent — define an SLA, monitor against it, plan capacity, respond to incidents. The agentic-AI twist is that the dependency surface is wider and the failure modes are qualitatively different. A model provider deprecating a version, a vector database hitting a quota, a tool API rate-limiting agent calls — each of these is an availability event that the traditional SaaS model does not have an obvious analogue for.
The mapping work for Availability has three pillars. First, define the SLA in a way that names the agentic surface — not just "the API is up" but "the agent completes a representative task within the latency envelope with the expected quality." Second, capacity-plan against the provider chain — model provider, vector store, retrieval embeddings, tool APIs — not just internal compute. Third, the incident response runbook has to cover provider-side outages with the same rigour as internal outages, because the customer experience does not distinguish between them.
Auditors look for evidence that availability targets are defined, monitored, reported on, and that incidents were responded to within the runbook's targets. For agentic systems they additionally probe how degraded-mode behaviour is defined — when the primary model is unavailable, does the agent fail loudly, degrade to a fallback, or queue? The answer matters more than the choice, but the choice has to be documented and tested.
Availability mapping · four pillars · provider-chain capacity is the most-missed pillar
Source: Availability mapping pillars, Digital Applied 2026One operational note: SLAs defined purely in terms of HTTP availability are insufficient for agentic systems. The agent can return a 200 with a useless answer, and from the customer's perspective the service is down. The mapping has to elevate the SLA from transport-layer success to task-layer success, with eval-based monitoring as the primary signal and HTTP availability as a secondary one. Auditors increasingly understand this distinction; framing the SLA in transport-only terms invites a finding.
The other recurring pattern: capacity plans that assume provider quotas are unlimited. Model providers ration capacity during peak demand, vector stores throttle at index-size thresholds, tool APIs rate-limit aggressive callers. A capacity plan that does not name each external quota and track headroom against it will eventually meet the quota at an inconvenient moment, and the post-incident review will find that the capacity discipline was the gap.
"An agent that returns a 200 with a useless answer is not available — it is just polite. Task-layer SLAs survive the audit; transport-layer SLAs invite the finding."— SOC2 Availability mapping rule · Digital Applied framework
04 — Processing IntegrityEval coverage, drift detection, faithfulness.
Processing Integrity is the TSC category that traditional SaaS audits handle with a straight face and agentic AI audits trip over hardest. The traditional control is some form of "processing is complete, valid, accurate, timely, and authorised." In a deterministic system, unit tests and referential integrity constraints carry most of that weight. In an agentic system, the outputs are probabilistic by design, and the question is not "was the calculation correct" but "was the answer faithful to the inputs within an acceptable tolerance."
The translation is eval coverage. Instead of unit-test coverage, the auditor expects eval coverage on representative inputs spanning the task surface — happy-path tasks, edge cases, adversarial inputs, bias-probe slices. The eval suite is versioned, the pass rate is monitored, and every model swap re-runs the suite as a blocking gate. Drift detection sits on top — if eval pass rate trends downward over the observation window, the change-management process has to catch it.
Faithfulness is the controversial part. Auditors are increasingly asking whether agent outputs are faithful to their inputs — does the summary reflect the source, does the extraction match the underlying document, does the agent hallucinate facts. The honest answer is that faithfulness is measurable but not always perfectly so; the mapping prescribes a faithfulness eval as part of the suite, with a documented threshold and a known residual risk.
Versioned suite covering the task surface
Eval suite is version-controlled, covers happy path, edge cases, adversarial inputs, and bias-probe slices. Suite is re-run on every model swap and archived. Auditors sample suite versions and verify that the pass rates referenced in the change-management log match the archive.
Versioned eval suitePass-rate monitoring across the window
Eval pass rate is monitored continuously and alerts on threshold breach. Drift over the observation window is tracked and explained — model swap, prompt change, retrieval-index update, or unexplained drift that warrants investigation. The drift log is part of the evidence file.
Continuous drift trackingFaithfulness eval as a first-class control
Faithfulness — output faithful to input — is measured via dedicated eval (judge model, structured comparison, human review on a sample). Documented threshold, documented residual risk, archived results. Auditors will probe how often the threshold was breached and what the response was.
Faithfulness thresholdProtected-slice regression checks
Eval pass rate is broken out by protected slices on every model swap to catch fairness regressions. The evidence artefact is the per-slice eval report archived alongside the overall suite. Skipping this step is the most-cited PI failure in audits that engage seriously with agentic AI behaviour.
Per-slice eval coverageThe eval suite is the artefact that does the most work in this category. Treat it as production infrastructure — version control, code review, change management, archival — not as a developer side-project. The auditor will ask to see the suite from twelve months ago and the suite from yesterday; if the answer is "it lived in a notebook on someone's laptop," the audit will struggle even if the suite was technically excellent. Promote the suite to a repository, give it owners, and treat suite changes as governed events.
Faithfulness deserves a longer note. The market is still settling on what "measurably faithful" means for different task surfaces — summarisation, extraction, agentic tool-use with feedback. The framework here is honest about the residual: a faithfulness eval with a documented threshold and a known false-negative rate beats no eval, and it beats an over-specified eval that the team cannot actually run on cadence. Auditors reward documented thresholds and residuals more than they reward unbounded ambition.
05 — Confidentiality + PrivacyTenant isolation, PII redaction, retention.
Confidentiality and Privacy are separate TSC categories but they share enough surface area in the agentic-AI translation that the mapping treats them together. Confidentiality covers non-personal sensitive data — IP, financial data, customer commercial information — while Privacy covers personal data specifically. Both are stressed by the same agentic-AI patterns: cross-tenant retrieval, prompt-injection exfiltration, retention drift in vector stores, and inadvertent inclusion of sensitive data in model-provider telemetry.
The mapping has four pillars. Tenant isolation — agents operating on tenant A's data never see tenant B's, and the evidence has to demonstrate it. PII redaction — inputs to and outputs from the model are scrubbed for PII at the boundary, with documented patterns and exception handling. Retention — vector stores, eval logs, and audit trails follow documented retention windows that match the customer contract, not the model provider's defaults. Sub-processor disclosure — model providers, retrieval vendors, and tool API providers are disclosed and their data-handling terms reviewed.
The pattern auditors increasingly probe is cross-tenant isolation evidence. It is one thing to assert that the agent cannot see tenant B's data when serving tenant A. It is another to demonstrate it with logs, with retrieval-query scoping, with deliberately adversarial test cases. The evidence artefact is the cross-tenant isolation test suite plus the audit log showing that scoped retrieval refused cross-tenant queries.
Tenant isolation end-to-end
Retrieval scoping · agent identity · audit logEvery retrieval query is tenant-scoped at the index level (not just at the application level); every agent identity is tenant-bound; every tool call carries the tenant context. Evidence artefact is the adversarial cross-tenant test suite plus the per-request log showing the tenant scope on every retrieval. Auditors test by attempting cross-tenant queries via a test harness.
Cross-tenant adversarial testsPII scrubbing at the boundary
Inbound and outbound · documented patterns · exception pathInputs to the model are scrubbed for PII patterns at the boundary; outputs are scrubbed before persistence. Patterns are documented, the exception path (when redaction would break the task) is documented, and the redaction log is archived. Evidence artefact is the pattern library plus a sampled audit log showing redaction events.
Boundary redactionVector-store and log retention windows
Per-tenant retention · documented · enforcedVector stores follow the customer-contracted retention window, not the vendor default. Audit logs follow the audit-retention window. Eval archives follow the regulatory window. Each retention window is documented per data type and enforced via automated purge. Evidence artefact is the retention-policy document plus the purge-job audit log.
Automated retention enforcementModel and tool provider disclosure
List maintained · DPA reviewed · customer-notifiedModel providers, retrieval-embedding vendors, and tool API providers are listed as sub-processors with DPAs reviewed by legal. Customer-facing sub-processor list is maintained and updated within the contractually-required notification window when a new sub-processor is added. Evidence artefact is the sub-processor register plus the customer notification log.
Sub-processor registerOne pattern worth flagging: model-provider telemetry leakage is the most-missed Confidentiality control. Default settings on several major model providers send inputs and outputs to the provider for training or analytics purposes. The mapping requires explicit opt-out on every provider, documented in the sub-processor register, and verified during the audit. Teams that have not explicitly opted out are usually surprised to discover what their default telemetry posture actually is.
For Privacy specifically, agentic AI raises the bar on data-subject-rights handling. A request for deletion has to reach not only the application database but also the vector store, the eval archive (if the data subject's inputs seeded eval examples), and the model-provider cache where applicable. The mapping has to name each location and the deletion mechanism for it. The auditor will sample a deletion request and walk the chain.
06 — EvidenceWhat the auditor wants to see.
Evidence collection is where most agentic-AI SOC2 programs either pay off the design discipline or pay the cost of having skipped it. The auditor's job is to test the controls' operating effectiveness over the observation window. That testing produces a request list — sample requests, sample logs, sample change-management tickets, sample eval runs, sample incident postmortems. Teams that instrumented evidence during build hand over the list in a week; teams that did not spend two months reconstructing artefacts.
The Stage 8 governance kit produces most of the evidence the SOC2 mapping needs as a side effect. The risk register walk-through minutes feed CC4. The model-update review log feeds CC7 and CC8. The incident runbook rehearsals feed CC6 and Availability. The ethics-forum decisions feed CC2. The quarterly framework review feeds the policy-and-procedure evidence across categories. Pair the governance kit with the evidence-collection pipeline and the audit becomes extraction rather than reconstruction.
The specific artefacts an auditor will ask for fall into a recognisable taxonomy. Documents — charter, policies, procedures. Records — change tickets, access grants, incident postmortems, eval runs. Logs — per-request audit logs, retrieval-query logs, alert response logs. System configurations — IAM policies, retention settings, telemetry opt-outs. Sampling — the auditor picks a sample from each and tests the control against it.
Build these eight evidence streams into the system at build time, not at audit time: per-request audit log with user, agent, tool, scope; change-management tickets covering model, prompt, eval, retrieval index; versioned eval-suite archive with per-run pass rates; cross-tenant isolation test results on a documented cadence; redaction event log; retention purge-job log; incident postmortem archive; quarterly governance review document archive.
One operational note: the evidence pipeline is the artefact that pays the longest dividend. Once the pipeline runs continuously, the marginal cost of an additional audit cycle is small — Type II re-attestations, customer-driven security reviews, ISO 27001 mapping, EU AI Act conformity documentation all draw from the same evidence pool. Teams that invest in the pipeline once amortise the investment across the full compliance surface; teams that treat each audit as a separate evidence-collection exercise pay the cost each time.
For teams that want a starting point, our AI transformation engagements include the SOC2 mapping work and the evidence-pipeline wiring so the team inherits a working translation table and the instrumentation that keeps the table populated. The pipeline pays for itself across the first two audit cycles.
07 — CadenceType II observation windows.
SOC2 Type II audits observe controls over a window — six months is common for first-cycle attestations, twelve months once the program is established. Across the window the auditor expects controls to have operated continuously, not just at audit time. That continuity is what the Stage 8 quarterly governance cadence produces as a side effect, and it is the reason the cadence work and the mapping work pair so naturally.
The cadence is straightforward once the mapping is in place. Weekly: production-health review surfaces eval drift, latency issues, cost trends, and operational anomalies — feeding CC4 monitoring evidence. Monthly: committee walks the risk register, reviews open incidents, queues model updates, convenes ethics forum items — feeding CC6, CC7, CC8, and ethics-forum evidence. Quarterly: framework review touches charter fitness, mapping accuracy, evidence-pipeline health, and runbook rehearsal — feeding the policy-and-procedure evidence stream and the audit-finding remediation log.
Aligning the quarterly framework review with the Type II observation calendar is the move that compresses audit preparation. The quarterly produces the evidence pack the auditor would otherwise have to assemble. Four quarterlies cover a twelve-month Type II window with no gaps; two cover a six-month first-cycle window. The auditor still samples and tests independently — the cadence does not replace the audit — but the artefacts they request already exist in the evidence pool.
Quarterly cadence · feeds the Type II observation window naturally
Source: Quarterly cadence mapping to Type II observation, Digital Applied 2026One operational subtlety: the quarterly review document is the single most leverage-able artefact in the entire compliance program. Auditors read it. Customers asking for security reviews read it. Internal stakeholders read it. Investing two extra hours in each quarterly review to make the document publication-quality pays back at every external touchpoint thereafter. Treat the quarterly document as a customer-facing artefact even if the audience is currently internal — the audience will broaden.
On audit timing — the right moment to engage a SOC2 auditor for a first cycle is around month four of the six-month window, with formal observation starting at month six and fieldwork at month nine. Engaging earlier produces a Type I attestation as a stepping stone; engaging later means the observation window runs without auditor input on evidence specifics, which sometimes produces avoidable findings.
SOC2 + agentic AI is a design problem, not a documentation problem.
The trap in SOC2 + agentic AI is treating the mapping table as the deliverable. The mapping table is a translation document. The deliverable is the evidence pipeline that the mapping enables — per-request audit logs, change-management tickets covering model and prompt revisions, versioned eval archives, cross-tenant isolation test results, redaction event logs, retention purge logs, incident postmortems, and the quarterly framework review document. The mapping table tells you what to instrument; the pipeline is what survives the Type II observation window.
Teams that treat SOC2 as a documentation exercise lose two ways. They lose at audit time, because the evidence is collected reactively and the controls cannot actually be tested over the observation window. And they lose at the customer-security-review stage, because the documentation artefacts they produce read like compliance paperwork rather than evidence of a working program. Teams that treat SOC2 as a design problem build the instrumentation alongside the agent and watch the audit collapse into evidence extraction.
Practical next step: pick the TSC category where your agentic-AI surface is most exposed and build the evidence pipeline for it first. Most teams start with CC6 logical access because the chain-of-delegation problem is the most visible. Some start with Processing Integrity because the eval-coverage discipline pays the most direct engineering dividend. A few start with Confidentiality because they have multi-tenant systems and customer questions about isolation. Pick one, build the pipeline, prove the pattern, then replicate across the remaining four categories. Six months in, the audit becomes a calendar event rather than a fire drill.