Agentic AI SOC2 controls mapping is the discipline of taking the five Trust Services Criteria — security, availability, processing integrity, confidentiality, and privacy — and translating each of the sixty underlying controls into something that an agent, a retrieval system, a tool-calling layer, and a model-update process can actually enforce. Done well, the same audit that already covers your platform absorbs the AI surface area. Done badly, the audit either ignores the agents (so the program is fictitiously compliant) or stalls at evidence collection because nobody instrumented the right artefacts when the system was built.

The temptation when SOC2 first meets agentic AI is to invent a new compliance regime — an "AI controls framework" that sits beside SOC2 and produces its own evidence stream. Resist that. SOC2 is a mature framework with audit-trained reviewers, a Type II observation discipline that already matches the operating cadence agents need, and a control taxonomy that maps cleanly to almost every agent concern. The work is mapping, not invention; the artefact is a translation table, not a parallel regime.

This framework walks the five TSC categories in order, names the controls that need explicit agentic-AI translation, prescribes the evidence the auditor expects, and lays out the quarterly cadence that keeps a Type II window populated without a heroic month-before-audit scramble. It is written for the team that has to be audit-ready in six months, not the team writing a policy white paper.

Framework scope

This guide covers SOC2 Type II — the observation-window audit that proves controls operated effectively over time. Type I (point-in-time design) is a stepping-stone; the agentic AI mapping work pays off in Type II because that is where evidence-cadence discipline shows. The companion piece on governance templates pairs naturally with this one: Stage 8 governance kit provides the operating loop; this framework provides the audit translation.

Key takeaways

01
SOC2 maps cleanly to agentic AI — do the mapping, not a rewrite.The Trust Services Criteria were written framework-agnostic, and almost every control has a clean agentic-AI translation. The temptation to invent a parallel AI compliance regime produces evidence sprawl and audit confusion. The right move is a translation table that names the agentic-AI behaviour each existing control covers and the evidence artefact that proves it.
02
Evidence-first design beats post-hoc collection by an order of magnitude.Teams that wire evidence collection into the system at build time clear Type II audits with weeks of preparation. Teams that try to reconstruct evidence at audit time spend months hunting for logs that were never retained, eval runs that were never archived, and access records that rotated out of the retention window. Build the evidence pipeline alongside the agent — not afterwards.
03
CC6 (logical access) is the highest-friction mapping for agentic systems.Agents act on behalf of users, services act on behalf of agents, and tools act on behalf of services. The chain of delegated authority does not map neatly onto SOC2's user-centric access-control vocabulary. The mapping has to name the agent identity, the tool-call boundary, the credential scope, and the audit-trail discipline that makes the chain reviewable.
04
Processing integrity needs eval coverage, not just unit tests.Traditional SOC2 processing-integrity controls assume deterministic transformations. Agentic systems are probabilistic by design. The mapping replaces unit-test coverage with eval coverage on representative inputs — task-specific evals, safety evals, bias probes — run on every model swap and archived for the observation window.
05
Quarterly cadence matches Type II observation windows naturally.SOC2 Type II observes over a window — typically six or twelve months. The Stage 8 governance cadence (weekly health, monthly committee, quarterly framework review) feeds the observation window with already-archived evidence. Teams that run the cadence walk into the audit with the evidence file mostly built; teams that do not have to backfill, which is where audits stall.

01 — Why MappingAgentic AI is novel; SOC2 is not — bridge them deliberately.

The instinct to write a new compliance framework for agentic AI is understandable. The systems are genuinely novel — non-deterministic outputs, tool chains that act on behalf of users, model versions that ship every few weeks, retrieval layers that change the effective behaviour of the system without a code change. SOC2 was authored for an era where software changed quarterly, data flowed through deterministic pipelines, and access control was a user-to-resource decision. None of those assumptions hold cleanly for an agent.

And yet the answer is mapping, not invention. The Trust Services Criteria themselves are written at a level of abstraction that survives the translation: "the entity restricts logical access to information assets" (CC6.1) is just as true a requirement when the entity is an agent as when it is a human user. What changes is the implementation — and the evidence. The framework here is the explicit translation between the existing control language and the agentic-AI implementation that satisfies it.

The cost of inventing a parallel regime is real. Auditors trained on SOC2 have to be re-trained; evidence streams have to be duplicated; the governance committee has to manage two control registers; and customers asking for an attestation get a bespoke document that requires their security team to evaluate from scratch. The cost of mapping is small and one-time. The cost of invention compounds quarterly.

Mapping maturity · four tiers · the gap is evidence automation, not control authorship

Source: Digital Applied audit-readiness tiers, 2026 field engagements

No mappingAgents operate without SOC2 translation · audit either ignores them or stalls · customers ask awkward questions

Tier 1

Partial mappingSome controls translated · evidence inconsistent · Type I survivable, Type II at risk

Tier 2

Full mapping, manual evidenceAll sixty controls translated · evidence collected on demand · audit takes weeks of prep

Tier 3

Full mapping, automated evidenceTranslation table maintained · evidence pipeline runs continuously · audit prep measured in days

Tier 4

The framing that holds up

The win condition for SOC2 + agentic AI is not a thicker policy — it is the evidence pipeline that runs without heroics. A modest translation table paired with continuous evidence capture beats a beautifully written AI-controls framework that requires a month of manual collection before every audit cycle. Optimise for the evidence pipeline, not the document polish.

02 — Security (CC)Logical access, change management, monitoring.

The Common Criteria (CC) — the security category — carries the majority of the sixty controls and the majority of the agentic-AI translation work. Five control families inside CC absorb the bulk of the mapping effort: CC6 (logical access), CC7 (system operations and change management), CC8 (change management for production code), CC2 (communication with stakeholders about controls), and CC4 (monitoring of controls). Each one has a tacit assumption that needs explicit translation when the system in scope is an agent rather than a deterministic service.

CC6 is the highest-friction family. Logical access in a traditional SaaS model is a user-to-resource decision: a named human authenticates, a policy evaluates, the resource grants or denies. In an agentic system the chain is longer — a user triggers an agent, the agent assumes a service identity, the service calls a tool, the tool acts on a resource. Each link in the chain has its own identity, scope, and audit trail, and the SOC2 auditor reasonably expects every link to be reviewable.

CC7 and CC8 cover change management, which is the family where agentic AI most visibly diverges from traditional SaaS. Model versions ship every few weeks, not every quarter; prompt changes modify behaviour without a code change; retrieval-index updates shift the effective system behaviour silently. The control translation has to name model swaps, prompt revisions, eval-set updates, and retrieval-index versioning as change-management events with the same gating discipline that production code already enjoys.

CC6 · Logical access

Agent RBAC with delegated authority

Every agent has its own identity (not a shared service account); every tool call carries a scoped credential; the user-to-agent-to-tool chain is auditable end-to-end. The evidence artefact is a per-request audit log showing the user, the agent identity, the tool invoked, and the credential scope. Auditors test by sampling a request and walking the chain backward.

Per-agent identity

CC7 · System operations

Model swaps as change events

Model identifiers, prompt templates, eval sets, and retrieval indexes are version-controlled and treated as change events with explicit approval gates. The evidence artefact is the model-update review log (see Stage 8 governance kit) showing the eval gate, canary gate, rollback gate, and communication gate all cleared.

Versioned model surface

CC8 · Change management

Prompts in code review, not in chat

Prompt revisions go through the same code-review process as production code. Eval-set additions go through review. Retrieval-index updates have a documented approval. The evidence artefact is the change-management ticket trail showing reviewer, eval impact, and rollback plan for each change. Slack-only prompt edits are the canonical anti-pattern.

Prompts-as-code discipline

CC2 · Communication

Customer-facing AI disclosure

Customers and stakeholders are informed of material AI capabilities in their service surface — what is automated, what is human-reviewed, what data flows into model providers. The evidence artefact is the customer-facing AI notice plus the internal record showing it was reviewed by legal and product before publication.

Explicit AI disclosure

CC4 · Monitoring

Eval drift as a control signal

Eval pass rate, latency, cost, and safety-eval signals are continuously monitored and alert when thresholds are crossed. The evidence artefact is the monitoring dashboard archive plus the alert-response log. Auditors test by sampling alerts and walking the response — who saw it, what they did, how it resolved.

Continuous eval monitoring

One subtlety on CC6 worth pulling out: the agent identity should be distinct from the service identity, and both should be distinct from the human user identity at the top of the chain. Collapsing the chain into a single "ai-service-account" credential is the canonical CC6 failure mode in agentic systems — it makes the audit trail unreviewable and makes scoped revocation impossible. The right model is one identity per agent persona, with separate credentials per tool the agent can call.

On CC7 and CC8, the asymmetry between forward gates and rollback authority matters as much in the compliance frame as it does in the governance frame. Auditors are comfortable with broad rollback authority as long as the post-fact ratification is documented; what auditors reject is forward changes that bypass the review process. The mapping has to make clear which gates are blocking-forward and which roles can pull rollback without committee re-approval.

03 — AvailabilitySLAs, capacity, incident response.

The Availability category looks deceptively similar to its traditional-SaaS equivalent — define an SLA, monitor against it, plan capacity, respond to incidents. The agentic-AI twist is that the dependency surface is wider and the failure modes are qualitatively different. A model provider deprecating a version, a vector database hitting a quota, a tool API rate-limiting agent calls — each of these is an availability event that the traditional SaaS model does not have an obvious analogue for.

The mapping work for Availability has three pillars. First, define the SLA in a way that names the agentic surface — not just "the API is up" but "the agent completes a representative task within the latency envelope with the expected quality." Second, capacity-plan against the provider chain — model provider, vector store, retrieval embeddings, tool APIs — not just internal compute. Third, the incident response runbook has to cover provider-side outages with the same rigour as internal outages, because the customer experience does not distinguish between them.

Auditors look for evidence that availability targets are defined, monitored, reported on, and that incidents were responded to within the runbook's targets. For agentic systems they additionally probe how degraded-mode behaviour is defined — when the primary model is unavailable, does the agent fail loudly, degrade to a fallback, or queue? The answer matters more than the choice, but the choice has to be documented and tested.

Availability mapping · four pillars · provider-chain capacity is the most-missed pillar

Source: Availability mapping pillars, Digital Applied 2026

Task-completion SLADefined in terms of representative task success · latency envelope · quality threshold · monitored continuously

Primary

Provider-chain capacityModel provider quotas · vector store throughput · tool API rate limits · headroom tracked per dependency

Capacity

Degraded-mode behaviourFallback model defined · queue depth bounded · loud-fail vs degrade decision documented per surface

Resilience

Incident response coverageRunbook covers provider outages with the same rigour as internal outages · rehearsed quarterly

Response

One operational note: SLAs defined purely in terms of HTTP availability are insufficient for agentic systems. The agent can return a 200 with a useless answer, and from the customer's perspective the service is down. The mapping has to elevate the SLA from transport-layer success to task-layer success, with eval-based monitoring as the primary signal and HTTP availability as a secondary one. Auditors increasingly understand this distinction; framing the SLA in transport-only terms invites a finding.

The other recurring pattern: capacity plans that assume provider quotas are unlimited. Model providers ration capacity during peak demand, vector stores throttle at index-size thresholds, tool APIs rate-limit aggressive callers. A capacity plan that does not name each external quota and track headroom against it will eventually meet the quota at an inconvenient moment, and the post-incident review will find that the capacity discipline was the gap.

"An agent that returns a 200 with a useless answer is not available — it is just polite. Task-layer SLAs survive the audit; transport-layer SLAs invite the finding."— SOC2 Availability mapping rule · Digital Applied framework

04 — Processing IntegrityEval coverage, drift detection, faithfulness.

Processing Integrity is the TSC category that traditional SaaS audits handle with a straight face and agentic AI audits trip over hardest. The traditional control is some form of "processing is complete, valid, accurate, timely, and authorised." In a deterministic system, unit tests and referential integrity constraints carry most of that weight. In an agentic system, the outputs are probabilistic by design, and the question is not "was the calculation correct" but "was the answer faithful to the inputs within an acceptable tolerance."

The translation is eval coverage. Instead of unit-test coverage, the auditor expects eval coverage on representative inputs spanning the task surface — happy-path tasks, edge cases, adversarial inputs, bias-probe slices. The eval suite is versioned, the pass rate is monitored, and every model swap re-runs the suite as a blocking gate. Drift detection sits on top — if eval pass rate trends downward over the observation window, the change-management process has to catch it.

Faithfulness is the controversial part. Auditors are increasingly asking whether agent outputs are faithful to their inputs — does the summary reflect the source, does the extraction match the underlying document, does the agent hallucinate facts. The honest answer is that faithfulness is measurable but not always perfectly so; the mapping prescribes a faithfulness eval as part of the suite, with a documented threshold and a known residual risk.

PI 01

Eval

Versioned suite covering the task surface

Eval suite is version-controlled, covers happy path, edge cases, adversarial inputs, and bias-probe slices. Suite is re-run on every model swap and archived. Auditors sample suite versions and verify that the pass rates referenced in the change-management log match the archive.

Versioned eval suite

PI 02

Drift

Pass-rate monitoring across the window

Eval pass rate is monitored continuously and alerts on threshold breach. Drift over the observation window is tracked and explained — model swap, prompt change, retrieval-index update, or unexplained drift that warrants investigation. The drift log is part of the evidence file.

Continuous drift tracking

PI 03

Faith.

Faithfulness eval as a first-class control

Faithfulness — output faithful to input — is measured via dedicated eval (judge model, structured comparison, human review on a sample). Documented threshold, documented residual risk, archived results. Auditors will probe how often the threshold was breached and what the response was.

Faithfulness threshold

PI 04

Bias

Protected-slice regression checks

Eval pass rate is broken out by protected slices on every model swap to catch fairness regressions. The evidence artefact is the per-slice eval report archived alongside the overall suite. Skipping this step is the most-cited PI failure in audits that engage seriously with agentic AI behaviour.

Per-slice eval coverage

The eval suite is the artefact that does the most work in this category. Treat it as production infrastructure — version control, code review, change management, archival — not as a developer side-project. The auditor will ask to see the suite from twelve months ago and the suite from yesterday; if the answer is "it lived in a notebook on someone's laptop," the audit will struggle even if the suite was technically excellent. Promote the suite to a repository, give it owners, and treat suite changes as governed events.

Faithfulness deserves a longer note. The market is still settling on what "measurably faithful" means for different task surfaces — summarisation, extraction, agentic tool-use with feedback. The framework here is honest about the residual: a faithfulness eval with a documented threshold and a known false-negative rate beats no eval, and it beats an over-specified eval that the team cannot actually run on cadence. Auditors reward documented thresholds and residuals more than they reward unbounded ambition.

05 — Confidentiality + PrivacyTenant isolation, PII redaction, retention.

Confidentiality and Privacy are separate TSC categories but they share enough surface area in the agentic-AI translation that the mapping treats them together. Confidentiality covers non-personal sensitive data — IP, financial data, customer commercial information — while Privacy covers personal data specifically. Both are stressed by the same agentic-AI patterns: cross-tenant retrieval, prompt-injection exfiltration, retention drift in vector stores, and inadvertent inclusion of sensitive data in model-provider telemetry.

The mapping has four pillars. Tenant isolation — agents operating on tenant A's data never see tenant B's, and the evidence has to demonstrate it. PII redaction — inputs to and outputs from the model are scrubbed for PII at the boundary, with documented patterns and exception handling. Retention — vector stores, eval logs, and audit trails follow documented retention windows that match the customer contract, not the model provider's defaults. Sub-processor disclosure — model providers, retrieval vendors, and tool API providers are disclosed and their data-handling terms reviewed.

The pattern auditors increasingly probe is cross-tenant isolation evidence. It is one thing to assert that the agent cannot see tenant B's data when serving tenant A. It is another to demonstrate it with logs, with retrieval-query scoping, with deliberately adversarial test cases. The evidence artefact is the cross-tenant isolation test suite plus the audit log showing that scoped retrieval refused cross-tenant queries.

Isolation

Tenant isolation end-to-end

Retrieval scoping · agent identity · audit log

Every retrieval query is tenant-scoped at the index level (not just at the application level); every agent identity is tenant-bound; every tool call carries the tenant context. Evidence artefact is the adversarial cross-tenant test suite plus the per-request log showing the tenant scope on every retrieval. Auditors test by attempting cross-tenant queries via a test harness.

Cross-tenant adversarial tests

Redaction

PII scrubbing at the boundary

Inbound and outbound · documented patterns · exception path

Inputs to the model are scrubbed for PII patterns at the boundary; outputs are scrubbed before persistence. Patterns are documented, the exception path (when redaction would break the task) is documented, and the redaction log is archived. Evidence artefact is the pattern library plus a sampled audit log showing redaction events.

Boundary redaction

Retention

Vector-store and log retention windows

Per-tenant retention · documented · enforced

Vector stores follow the customer-contracted retention window, not the vendor default. Audit logs follow the audit-retention window. Eval archives follow the regulatory window. Each retention window is documented per data type and enforced via automated purge. Evidence artefact is the retention-policy document plus the purge-job audit log.

Automated retention enforcement

Sub-processors

Model and tool provider disclosure

List maintained · DPA reviewed · customer-notified

Model providers, retrieval-embedding vendors, and tool API providers are listed as sub-processors with DPAs reviewed by legal. Customer-facing sub-processor list is maintained and updated within the contractually-required notification window when a new sub-processor is added. Evidence artefact is the sub-processor register plus the customer notification log.

Sub-processor register

One pattern worth flagging: model-provider telemetry leakage is the most-missed Confidentiality control. Default settings on several major model providers send inputs and outputs to the provider for training or analytics purposes. The mapping requires explicit opt-out on every provider, documented in the sub-processor register, and verified during the audit. Teams that have not explicitly opted out are usually surprised to discover what their default telemetry posture actually is.

For Privacy specifically, agentic AI raises the bar on data-subject-rights handling. A request for deletion has to reach not only the application database but also the vector store, the eval archive (if the data subject's inputs seeded eval examples), and the model-provider cache where applicable. The mapping has to name each location and the deletion mechanism for it. The auditor will sample a deletion request and walk the chain.

06 — EvidenceWhat the auditor wants to see.

Evidence collection is where most agentic-AI SOC2 programs either pay off the design discipline or pay the cost of having skipped it. The auditor's job is to test the controls' operating effectiveness over the observation window. That testing produces a request list — sample requests, sample logs, sample change-management tickets, sample eval runs, sample incident postmortems. Teams that instrumented evidence during build hand over the list in a week; teams that did not spend two months reconstructing artefacts.

The Stage 8 governance kit produces most of the evidence the SOC2 mapping needs as a side effect. The risk register walk-through minutes feed CC4. The model-update review log feeds CC7 and CC8. The incident runbook rehearsals feed CC6 and Availability. The ethics-forum decisions feed CC2. The quarterly framework review feeds the policy-and-procedure evidence across categories. Pair the governance kit with the evidence-collection pipeline and the audit becomes extraction rather than reconstruction.

The specific artefacts an auditor will ask for fall into a recognisable taxonomy. Documents — charter, policies, procedures. Records — change tickets, access grants, incident postmortems, eval runs. Logs — per-request audit logs, retrieval-query logs, alert response logs. System configurations — IAM policies, retention settings, telemetry opt-outs. Sampling — the auditor picks a sample from each and tests the control against it.

The evidence pipeline checklist

Build these eight evidence streams into the system at build time, not at audit time: per-request audit log with user, agent, tool, scope; change-management tickets covering model, prompt, eval, retrieval index; versioned eval-suite archive with per-run pass rates; cross-tenant isolation test results on a documented cadence; redaction event log; retention purge-job log; incident postmortem archive; quarterly governance review document archive.

One operational note: the evidence pipeline is the artefact that pays the longest dividend. Once the pipeline runs continuously, the marginal cost of an additional audit cycle is small — Type II re-attestations, customer-driven security reviews, ISO 27001 mapping, EU AI Act conformity documentation all draw from the same evidence pool. Teams that invest in the pipeline once amortise the investment across the full compliance surface; teams that treat each audit as a separate evidence-collection exercise pay the cost each time.

For teams that want a starting point, our AI transformation engagements include the SOC2 mapping work and the evidence-pipeline wiring so the team inherits a working translation table and the instrumentation that keeps the table populated. The pipeline pays for itself across the first two audit cycles.

07 — CadenceType II observation windows.

SOC2 Type II audits observe controls over a window — six months is common for first-cycle attestations, twelve months once the program is established. Across the window the auditor expects controls to have operated continuously, not just at audit time. That continuity is what the Stage 8 quarterly governance cadence produces as a side effect, and it is the reason the cadence work and the mapping work pair so naturally.

The cadence is straightforward once the mapping is in place. Weekly: production-health review surfaces eval drift, latency issues, cost trends, and operational anomalies — feeding CC4 monitoring evidence. Monthly: committee walks the risk register, reviews open incidents, queues model updates, convenes ethics forum items — feeding CC6, CC7, CC8, and ethics-forum evidence. Quarterly: framework review touches charter fitness, mapping accuracy, evidence-pipeline health, and runbook rehearsal — feeding the policy-and-procedure evidence stream and the audit-finding remediation log.

Aligning the quarterly framework review with the Type II observation calendar is the move that compresses audit preparation. The quarterly produces the evidence pack the auditor would otherwise have to assemble. Four quarterlies cover a twelve-month Type II window with no gaps; two cover a six-month first-cycle window. The auditor still samples and tests independently — the cadence does not replace the audit — but the artefacts they request already exist in the evidence pool.

Quarterly cadence · feeds the Type II observation window naturally

Source: Quarterly cadence mapping to Type II observation, Digital Applied 2026

Weekly · production healthEval drift · latency · cost · operational anomalies — feeds CC4 monitoring evidence

Health

Monthly · committeeRisk register walk · incident review · model-update queue · ethics-forum items — feeds CC6/7/8 evidence

Committee

Quarterly · frameworkCharter fitness · mapping accuracy · evidence-pipeline health · runbook rehearsal — feeds policy and audit evidence

Framework

Type II observation alignmentFour quarterlies cover a 12-month window · two cover a 6-month first-cycle window — evidence pre-assembled

Audit

One operational subtlety: the quarterly review document is the single most leverage-able artefact in the entire compliance program. Auditors read it. Customers asking for security reviews read it. Internal stakeholders read it. Investing two extra hours in each quarterly review to make the document publication-quality pays back at every external touchpoint thereafter. Treat the quarterly document as a customer-facing artefact even if the audience is currently internal — the audience will broaden.

On audit timing — the right moment to engage a SOC2 auditor for a first cycle is around month four of the six-month window, with formal observation starting at month six and fieldwork at month nine. Engaging earlier produces a Type I attestation as a stepping stone; engaging later means the observation window runs without auditor input on evidence specifics, which sometimes produces avoidable findings.

Conclusion

SOC2 + agentic AI is a design problem, not a documentation problem.

The trap in SOC2 + agentic AI is treating the mapping table as the deliverable. The mapping table is a translation document. The deliverable is the evidence pipeline that the mapping enables — per-request audit logs, change-management tickets covering model and prompt revisions, versioned eval archives, cross-tenant isolation test results, redaction event logs, retention purge logs, incident postmortems, and the quarterly framework review document. The mapping table tells you what to instrument; the pipeline is what survives the Type II observation window.

Teams that treat SOC2 as a documentation exercise lose two ways. They lose at audit time, because the evidence is collected reactively and the controls cannot actually be tested over the observation window. And they lose at the customer-security-review stage, because the documentation artefacts they produce read like compliance paperwork rather than evidence of a working program. Teams that treat SOC2 as a design problem build the instrumentation alongside the agent and watch the audit collapse into evidence extraction.

Practical next step: pick the TSC category where your agentic-AI surface is most exposed and build the evidence pipeline for it first. Most teams start with CC6 logical access because the chain-of-delegation problem is the most visible. Some start with Processing Integrity because the eval-coverage discipline pays the most direct engineering dividend. A few start with Confidentiality because they have multi-tenant systems and customer questions about isolation. Pick one, build the pipeline, prove the pattern, then replicate across the remaining four categories. Six months in, the audit becomes a calendar event rather than a fire drill.

Agentic AI SOC2 Controls Mapping: Framework 2026