AI DevelopmentPlaybook17 min readPublished May 22, 2026

The governance layer nobody built yet — 40 enterprise guardrails in one checklist.

Enterprise Computer-Use Guardrails: The 40-Point Playbook

Computer-use agents inherited human-level access to every UI in your enterprise the day Microsoft Copilot Studio shipped to GA. The vendors document their own controls. Nobody has stitched those controls, OWASP Agentic Top 10 mitigations, and NIST AI RMF requirements into one operational playbook — until now. Forty guardrails, five sections, four vendors covered.

DA
Digital Applied Team
Senior strategists · Published May 22, 2026
PublishedMay 22, 2026
Read time17 min
Sources12
Guardrails
40
Operational controls
5 sections × 8 each
Vendors covered
4
Microsoft · Anthropic
Plus OWASP + NIST
HITL status
Probabilistic
Per Microsoft docs
Not a safety guarantee
Cowork audit
Excluded
As of May 2026
API differs — see §08

Computer-use agents reached enterprise general availability on May 22, 2026 when Microsoft Copilot Studio shipped across all commercial Power Platform geographies — vision-based UI navigation supporting OpenAI Computer-Using Agent and three Anthropic Claude models, operating on any screen without APIs or platform redevelopment. The governance layer those agents require did not ship with them. This playbook synthesizes Microsoft Learn docs, Anthropic computer-use documentation, OWASP Agentic Top 10, and the Cloud Security Alliance NIST AI RMF Agentic Profile into a single 40-point enterprise guardrails checklist that operators can run today.

The urgency is not theoretical. UI-driven agents inherit human-level access to every application a user can reach. Without identity isolation, credential vaulting, action boundaries, and deterministic kill-switches, a misbehaving agent can exfiltrate data, execute irreversible transactions, or be hijacked by prompt injection in a screenshot. Two failure modes account for the majority of production incidents: the "access control is NOT egress control" gap in Copilot Studio's allowlist, and the widely misunderstood probabilistic nature of human-in-the-loop escalation.

This playbook covers both footguns in detail, provides the full 40-point checklist (five sections × 8 controls, vendor coverage mapped), walks the legacy-app use-case stack, and closes with compliance framework mapping against HIPAA, SOC 2, ISO 27001, FedRAMP, and NIST AI RMF. For the Microsoft GA announcement details, see the Copilot Studio computer-use GA deep dive. For routing between Microsoft, Anthropic, and Google, see the three-way platform routing guide.

Key takeaways
  1. 01
    Access control blocks actions, not navigation — a critical gap.Microsoft's own documentation states: Copilot Studio's Access control allowlist prevents the model from taking actions on off-list websites, but it does NOT stop the model from opening them. An agent can navigate to an off-list site and read its content. Egress filtering at the network layer (Intune/Edge policy for Copilot Studio; Docker --network flag for Anthropic) is a required complement, not an optional add-on.
  2. 02
    HITL is a soft heuristic, not a safety gate.Microsoft explicitly warns: 'Human review requests are triggered by probabilistic AI model behavior. You should not rely on human review or clarification requests as a safety fail-safe or as a guarantee that the system will always request human input before proceeding.' NIST AI RMF Agentic Profile AG-MG.1 pairs HITL with a deterministic kill-switch — pre-authorized automatic suspension — precisely because probabilistic escalation alone is insufficient.
  3. 03
    Maker credentials on shared agents create a privilege-escalation footgun.If a Copilot Studio agent is deployed using maker credentials and then shared with other users, every recipient runs with the maker's access level — not their own. The documentation calls this out explicitly. The fix: configure end-user credentials mode or vault-backed per-user credential injection for any shared agent that touches privileged systems.
  4. 04
    Anthropic Cowork is excluded from the compliance audit trail.As of May 2026, Anthropic's Cowork product is explicitly excluded from Audit Logs, the Compliance API, and Data Exports — across all plan tiers including Enterprise. For SOX, HIPAA, PCI DSS, and SOC 2 audit trail requirements, workloads must run through the Claude API (not Cowork). This is a product-level distinction, not a model-level one.
  5. 05
    Forty controls across five layers is the minimum viable governance set.The 40-point checklist in this playbook covers Identity & Access (8), Observability & Audit (8), Action Boundaries (8), Failure Handling (8), and Compliance Mapping (8). All 40 are traceable to primary sources: Microsoft Learn, Anthropic docs, OWASP Agentic Top 10 (Dec 9, 2025), or the CSA NIST AI RMF Agentic Profile v1. No control in this checklist is invented.

01Critical Failure ModesThe two footguns that end most computer-use deployments.

Hundreds of blog posts cover computer-use features. Almost none cover the two failure modes that security and compliance teams encounter first. Both are documented in primary vendor sources. Both are consistently omitted in vendor marketing summaries.

Footgun 1: Access control is NOT egress control. Copilot Studio's built-in "Access control" feature lets you configure an allowlist of websites and applications the agent is permitted to interact with. Microsoft's own documentation states verbatim: "Access control only prevents the model from taking actions on websites or applications that aren't in the allow list. It doesn't stop the model from opening them." The agent can navigate to an off-list URL, render its content, and potentially leak information from that page into its reasoning trace — even if it cannot click a button or submit a form there. Egress filtering at the network host layer (Intune policies, Edge policies, or Docker --network flags for Anthropic-direct deployments) is the complement that closes the gap. These are different technical controls. Treating the Access control allowlist as a firewall is the leading safety mistake in Copilot Studio computer-use deployments.

Footgun 2: HITL is probabilistic, not deterministic. Microsoft's Human supervision documentation is unusually direct: "Human review requests are triggered by probabilistic AI model behavior… You should not rely on human review or clarification requests as a safety fail-safe or as a guarantee that the system will always request human input before proceeding." This is a significant caveat that most playbooks ignore. Operators who treat human-in-the-loop escalation as a binary gate — either an action is approved by a human or it does not happen — are operating on a false premise. The model may proceed without escalating. The correct architecture pairs HITL with a deterministic backstop: NIST AI RMF Agentic Profile AG-MG.1 calls for "pre-authorized automatic containment responses" including "automated agent suspension or kill-switch activation" for highest-severity incident patterns.

Anthropic's computer-use documentation adds a third failure mode worth naming: "Computer use is a beta feature with unique risks distinct from standard API features. These risks are heightened when interacting with the internet… In some circumstances, Claude will follow commands found in content even if it conflicts with the user's instructions." Anthropic's prompt injection classifier for computer use (active by default, opt-out requires contacting support) partially addresses this by triggering user confirmation when injections are detected in screenshots. It is a soft mitigant, not a hard block.

Microsoft Learn — Human supervision of computer use (updated May 7, 2026)

"Computer-use agents might encounter prompt injection attacks, where hidden instructions in screenshots, web pages, or other inputs attempt to influence actions in unintended ways. To minimize this risk, operate these agents within trusted, isolated environments and apply robust validation checks before executing any instructions." — Microsoft Copilot Studio documentation, learn.microsoft.com/en-us/microsoft-copilot-studio/human-supervision-computer-use

02Use-Case QualificationWhen computer use is the right tool — and when it is not.

Computer-use agents solve a specific problem: automation of systems that have no API, no webhook, and no selector-stable UI that RPA can reliably target. They are not a universal automation layer. Before deploying one, the qualification question is whether a cheaper, more deterministic alternative exists. Industry estimates suggest RPA bot maintenance runs 30-40% of total RPA program budget — and vision-based agents can reduce that maintenance tax for UI-volatile applications, according to vendor-aligned analysis. But that reduction comes with a different cost: governance overhead.

The canonical fit is a legacy application with a GUI that predates APIs: SAP GUI screens, mainframe terminals (3270/5250), Salesforce Classic, legacy ERP order-entry forms, and industrial equipment consoles. Graebel — the global mobility and relocation company named as Microsoft's GA reference customer — runs its Service Order Agent in Copilot Studio to operate the Global Connect platform "directly through its UI — navigating screens, entering data, and completing transactions exactly as a trained human operator would, without APIs or platform redevelopment." DevOps.com reported the Graebel case study alongside Microsoft's GA announcement.

Computer use is NOT the right first choice for: systems with stable REST/GraphQL APIs (use function calling or MCP instead), applications with documented automation SDKs (Power Automate Desktop's first-party SAP GUI RPA actions cover SAP automation without vision), or workflows where the action surface changes often enough that a vision agent would re-plan every run (the model's uncertainty grows with UI volatility). Use computer use for the hard cases that nothing else can reach. For a routing decision between Microsoft Copilot Studio, Anthropic direct API, and Google computer-use options, see the three-way comparison and routing guide.

Best fit
Legacy GUI with no API surface
SAP GUI · Mainframe · ERP order-entry

Systems where no REST API, webhook, or selector-stable UI exists. Vision-based navigation is the only automation path. Governance cost is justified by the absence of alternatives.

Primary use case
Good fit
Multi-app cross-system workflows
CRM + ERP + email in sequence

Workflows spanning 3+ apps with mixed API coverage. Computer use bridges the gaps where one leg has no API. Each app-crossing is an isolation checkpoint.

With per-app credential isolation
Poor fit
API-reachable systems with stable schemas
Salesforce REST · HubSpot API · Stripe

If a REST API or documented SDK exists, use function calling or MCP instead. Vision agents add governance overhead without benefit when a cheaper deterministic path is available.

Use function calling or MCP
High risk
Irreversible transactions without HITL
Purchase orders · Wire transfers · Email sends

Any action that cannot be undone. Computer use can reach these surfaces; deploying without a hard HITL gate plus a kill-switch for irreversible steps is an unacceptable risk posture.

Requires HITL + kill-switch

03Enterprise Guardrails ChecklistThe 40-point enterprise guardrails checklist — five sections, vendor-mapped.

The checklist below synthesizes Microsoft Learn computer-use documentation (updated May 21, 2026), Microsoft Monitor computer use docs (updated May 7, 2026), Anthropic computer-use documentation, OWASP Top 10 for Agentic Applications 2026 (published December 9, 2025, 100+ contributors), and the Cloud Security Alliance NIST AI RMF Agentic Profile v1. Every control is sourced; none is invented. See also the broader agent governance policy framework and the seven best practices for agent audit trail design.

Section 1 — Identity & Access (8 controls)
Per-agent identities, vault injection, maker-credential warning, MFA.

1. Per-agent service-account identity — no shared logins. Microsoft: Entra agent identities (preview, GA-tracking). Anthropic: customer-implemented via API keys. 2. Vault-based credential injection — never in prompt. Microsoft: Azure Key Vault + Power Platform internal store. Anthropic: customer-implemented; docs recommend XML-tagged credential blocks at minimum. 3. Maker-credentials warning on shared agents — required. If maker credentials are used and the agent is shared, every recipient acts with the maker's access level. 4. End-user-credentials mode for per-session identity — recommended for shared agents. 5. SSO / OAuth 2.0 for downstream apps — both platforms support. 6. Role-based maker permissions — Environment Admin / System Customizer required to deploy computer-use agents in Copilot Studio. 7. Least-privilege user account on the executing machine — documented best practice on both platforms. 8. MFA on the human reviewer's account — inherited via Entra (Microsoft); customer-implemented (Anthropic).

Required for all deployments
Section 2 — Observability & Audit (8 controls)
Per-action screenshots, PII scrubbing, SIEM export, append-only retention.

9. Per-action screenshot capture — Microsoft: native session replay with verbosity toggle. Anthropic: required in customer agent loop. 10. Screenshot scrubbing / PII redaction before storage — Microsoft: configure verbosity to 'Data without screenshots' or 'Minimal'. Anthropic: customer-implemented. 11. Step-level reasoning log — Microsoft: transcript view with reasoning per step. Anthropic: tool-call trace via API responses. 12. Append-only audit retention ≥1 year for SOC 2 — Microsoft: configurable up to ~63.8 years (33,554,432 minutes) or indefinite (0 / -1); default is 7 days (10,080 minutes) — extend immediately. Anthropic: Compliance API streams to SIEM; 30-day default in Admin Console. 13. SIEM export — Microsoft: Purview CUAOperation term. Anthropic: Compliance API (Splunk, Datadog, Elastic, Sentinel). 14. Drift detection via UI baseline screenshots — recommended for both; quarterly session-replay audit pattern. 15. Action timestamps and coordinates — Microsoft: native side-panel data. Anthropic: returned in tool-call response. 16. Credentials-used audit — Microsoft: native side-panel 'Credentials used' field. Anthropic: customer-implemented.

Mandatory for regulated workloads
Section 3 — Action Boundaries (8 controls)
Allowlists, rate limits, isolated machines, NIST AG-GV.2.

17. Allowlist of permitted websites — Microsoft: native Access control (⚠️ blocks actions, NOT navigation — add network egress filtering). Anthropic: customer URL filter. 18. Allowlist of permitted desktop apps — Microsoft: native Access control. Anthropic: container-only environment. 19. Block destructive actions without human approval — Microsoft: configure human supervision (probabilistic only — pair with kill-switch). Anthropic: classifier confirmation; customer gate. 20. Rate limiting per agent as a cost guardrail — Microsoft: Copilot Credit pool limits. Anthropic: customer-implemented via API gateway. 21. Cost ceiling alerts — Microsoft: Power Platform admin center. Anthropic: Console + customer monitoring. 22. Dedicated isolated machine per agent — Microsoft: documented best practice; Cloud PC pool (Windows 365 for Agents). Anthropic: 'use the tool on a container or virtual machine' per docs. 23. Network egress filtering at host — Microsoft: Intune / Edge policy. Anthropic: Docker --network flag. 24. NIST AG-GV.2 formal autonomy-scope document — customer-authored on both platforms; required by NIST AI RMF Agentic Profile. Defines the exact scope of actions the agent is authorized to take without human approval.

Required — closes egress gap
Section 4 — Failure Handling (8 controls)
Kill-switch, rollback, timeouts, dry-run mode, incident runbook.

25. Human-in-the-loop checkpoints (probabilistic) — Microsoft: native + Outlook escalation. Anthropic: prompt-injection classifier triggers confirmation. Both: pair with deterministic backstop. 26. NIST AG-MG.1 kill-switch — pre-authorized automatic suspension — Microsoft: 'Stop testing' mid-run + agent disable in admin center. Anthropic: customer-implemented stop of agent loop. 27. Rollback pattern for reversible actions (CRM, file ops) — customer pattern on both platforms; native undo where supported. 28. Irreversible-action confirmation gate (purchases, email sends) — configure HITL + Access control (Microsoft); classifier confirmation + customer gate (Anthropic). 29. Timeout enforcement — Microsoft: configurable HITL response window. Anthropic: customer-implemented request expiry. 30. Incident response runbook for off-script behavior — customer-authored; Sentinel alerting (Microsoft). 31. Snapshot / rollback of session state — Microsoft: Cloud PC fresh profile per session. Anthropic: container teardown per session. 32. Dry-run / sandbox mode before production publish — Microsoft: Test mode in Copilot Studio. Anthropic: customer-implemented staging environment.

Required for production
Section 5 — Compliance Mapping (8 controls)
HIPAA, SOC 2, ISO 27001, FedRAMP, NIST AI RMF.

33. HIPAA BAA coverage — Microsoft: Copilot Studio is BAA-covered; note it is not intended as a medical device. Anthropic: available via Claude Enterprise add-on; verify per workload. 34. SOC 2 Type II — Microsoft: yes, via Microsoft SOC reports (Service Trust Portal). Anthropic: yes (Anthropic SOC 2 Type II); Cowork product excluded from audit trail. 35. ISO 27001 + 27017 + 27018 + 27701 — Microsoft: Copilot Studio + Azure inherit all four. Anthropic: certifications page. 36. FedRAMP — Microsoft: US Government cloud SKUs; Anthropic: verify per workload via Anthropic government program. 37. PCI DSS — Microsoft: yes. Anthropic: customer to verify per workload. 38. GDPR + EU data residency — Microsoft: geo-residency configurable. Anthropic: customer-implemented; ZDR add-on (Zero Data Retention eligible for computer use). 39. EU AI Act risk-tier classification — customer DPIA required on both platforms. 40. NIST AI RMF Govern/Map/Measure/Manage mapping — Microsoft: NIST mapping documentation published. Anthropic: Responsible Scaling Policy. Full Microsoft compliance list: ISO 9001, 20000-1, 22301, 27001, 27017, 27018, 27701, HIPAA BAA, HITRUST CSF, FedRAMP, SOC, PCI DSS, CSA STAR, UK G-Cloud, OSPAR, K-ISMS, Singapore MTCS Level 3, Spain ENS High.

Verify per regulated workload

04Use-Case Stack GuideLegacy-app use cases: the recommended stack per failure mode.

The controls that matter most depend on the failure mode the legacy application introduces. A pharmaceutical batch-record system has different top risks than an SAP GUI data-entry workflow. The table below maps six enterprise use cases to their primary risk and the corresponding stack recommendation drawn from the checklist above. For the full Anthropic computer-use API reference, see Anthropic's original computer-use API guide. For production deployment patterns with Claude, see the production deployment guide for Claude computer use.

SAP GUI
Power Automate Desktop SAP actions first
RPA+CU

Primary risk: privileged-access blast radius. Stack: Power Automate Desktop SAP-GUI actions as primary; computer use as fallback only. Per-record HITL on any write. Azure Key Vault credential injection.

Allowlist: sap:// process only
Salesforce Classic
Verbosity = Data without screenshots
PIIscrub

Primary risk: PII exposure in screenshot logs. Stack: Verbosity set to 'Data without screenshots' before production publish. Azure Key Vault credentials. Allowlist *.salesforce.com only.

Financial services / regulated
Mainframe 3270
Read-only mode by default; HITL on commit
Read-only

Primary risk: irreversible CICS transactions. Stack: read-only default mode; HITL on any commit; pre-authorized auto-suspend on schema drift detection. Container isolation.

Kill-switch required
Pharma Records
HITL on every sign-off; 7+ year retention
21CFRPt11

Primary risk: 21 CFR Part 11 e-signature compliance. Stack: HITL on every sign-off step; full session replay retained for ≥7 years; Purview CUAOperation export to SIEM.

Compliance: FDA 21 CFR Part 11

05Credential ArchitectureCredential and identity isolation: the two-mode Copilot Studio pattern.

Copilot Studio supports two credential storage modes for computer-use agents. Microsoft's documentation describes both: Power Platform internal storage (encrypted at rest, zero configuration) and customer-owned Azure Key Vault (bring-your-own vault, higher governance maturity). In both cases, credentials are injected at runtime and passwords never appear in prompts — a critical baseline. The risk comes not from the storage mechanism but from the credential-binding configuration.

The privilege-escalation footgun appears when agents are deployed using maker credentials and then shared. Microsoft's docs flag this explicitly: if you share an agent with this configuration, anyone using it acts with the original author's access on the configured machine. This is not a bug — it is an intentional capability for automating shared service accounts. The danger is unintentional deployment: a maker builds an agent for their own privileged account, shares it for team access, and inadvertently grants everyone on the team that elevated access.

The resolution is end-user credentials mode, which binds each session to the initiating user's own identity. Combined with Microsoft Entra's per-agent service account feature (in preview as of May 2026, GA-tracking), this architecture ensures each computer-use agent has a distinct identity, each session is traceable to a specific user, and the blast radius of a compromised agent is bounded to that agent's least-privilege scope. For broader identity control patterns across the agent estate, see the Microsoft Agent Governance Toolkit's runtime-security catalog.

Anthropic's computer-use API does not provide native credential storage — credential injection is fully customer-implemented. Anthropic's documentation recommends, at minimum, passing credentials inside XML-tagged blocks in the system prompt rather than inline in the conversation. The production-grade pattern is a secrets manager (HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault) that injects credentials into the agent's context at session start and rotates them on schedule. Our AI transformation practice routinely includes credential isolation architecture in computer-use deployment engagements — it is consistently the control that teams omit in early pilots and retrofit under compliance pressure.

Procurement should treat this as a control plane question. UI-driven agents inherit human-level reach across legacy and SaaS workflows, so authorization, audit, and exception handling must operate at the action level. Without that visibility, autonomy stalls short of what the technology enables.Mitch Ashley, VP & Practice Lead, Software Lifecycle Engineering, The Futurum Group — DevOps.com, May 18, 2026

06Human-in-the-Loop ArchitectureDesigning HITL that actually works — the probabilistic-plus-deterministic pair.

The practical design of human-in-the-loop checkpoints for computer-use agents requires distinguishing between two layers: the model-triggered escalation (probabilistic, vendor-provided) and the rule-triggered gate (deterministic, customer-configured). Copilot Studio's native HITL sends an Outlook email with an inline review card when the model detects "potentially harmful instructions that could alter model behavior." Each run is tied to the initiating user identity. This is useful but not sufficient for high-stakes workflows.

The deterministic layer sits at the action type, not the model's confidence score. Define which action categories require human confirmation regardless of model state: all irreversible writes (purchases, emails, record deletes), all cross-system credential handoffs, and all actions touching regulated data fields. These categories do not escalate based on the model's uncertainty — they always escalate. The mechanism in Copilot Studio is the Access control allowlist (preventing the agent from acting on the target surface until a human releases it) combined with a HITL review card. In Anthropic direct-API deployments, the customer implements this as a confirmation step in the agent loop before calling the computer-use tool.

The third layer — absent from most deployments — is the kill-switch. NIST AI RMF Agentic Profile AG-MG.1 requires "pre-authorized automatic containment responses" including "automated agent suspension or kill-switch activation" for highest-severity incident patterns. This is explicitly not a human decision: the specification calls for automatic suspension for incidents like schema drift (the target UI has changed in a way the agent cannot handle safely) or anomalous action velocity (the agent is taking actions far faster than a human baseline). The kill-switch fires before a human even sees the incident. For the full 60-point observability framework that feeds these kill-switch triggers, see the broader 60-point observability audit.

HITL layer coverage — required vs soft controls

Control requirements synthesized from Microsoft Learn human-supervision docs (May 7, 2026) and NIST AI RMF Agentic Profile v1 (CSA Labs)
Deterministic gate (rule-triggered): irreversible actions always escalateCustomer-configured — does not depend on model confidence
Required
Probabilistic HITL (model-triggered): Copilot Studio native escalationMicrosoft docs: 'not a safety fail-safe' — probabilistic only
Soft only
Kill-switch (NIST AG-MG.1): pre-authorized auto-suspensionFires automatically — not a human decision
Required
Timeout enforcement: HITL request expiryConfigurable in Copilot Studio; customer-implemented on Anthropic direct
Required

07Compliance FrameworkMapping computer-use controls to OWASP, NIST, and audit frameworks.

The OWASP Top 10 for Agentic Applications 2026 (released December 9, 2025 by the OWASP GenAI Security Project, with input from 100+ security researchers and Gen-AI providers) names ten risk categories directly applicable to computer-use deployments. The controls in this playbook map to six of those ten: ASI01 Agent Goal Hijack (addressed by prompt-injection classifier and trusted-environment isolation), ASI02 Tool Misuse & Exploitation (action allowlists and rate limiting), ASI03 Identity & Privilege Abuse (per-agent service accounts and end-user credentials), ASI06 Memory & Context Poisoning (screenshot scrubbing and validated inputs), ASI08 Cascading Failures (kill-switch and timeout enforcement), and ASI09 Human-Agent Trust Exploitation (deterministic HITL gates).

The CSA NIST AI RMF Agentic Profile v1 adds two specific control identifiers that compliance programs should document: AG-GV.2 — formal documentation of "the scope of actions the agent is authorized to take without human approval" (the autonomy-scope document, control #24 in the checklist above) — and AG-MG.1 — pre-authorized automatic containment responses including kill-switch activation (control #26). Both are customer-authored artifacts; neither vendor provides a template. For the full OWASP business-leader guide, see the post on the OWASP Agentic Top 10 for business leaders.

Microsoft Copilot Studio's compliance inheritance is extensive. Per the Microsoft admin-certification documentation, Copilot Studio is covered under HIPAA BAA, HITRUST CSF, FedRAMP, SOC, ISO 9001, 20000-1, 22301, 27001, 27017, 27018, 27701, PCI DSS, CSA STAR, UK G-Cloud, OSPAR, K-ISMS, Singapore MTCS Level 3, and Spain ENS High as an Online Service under the Microsoft Online Services Terms. Computer-use agents running in Copilot Studio inherit this compliance posture for the platform layer. The customer retains responsibility for: the autonomy-scope document (AG-GV.2), the incident runbook, DPIA for EU AI Act classification, and any application-layer data controls (PII in screenshots, cross-border transfers).

Purview integration for computer-use audit trails is activated by toggling "Send audit logs to Microsoft Purview" in the Power Platform admin center. Runs export under the activity term CUAOperation. For the MCP server security context that often accompanies computer-use deployments, see the MCP server security audit checklist.

08Anthropic CoworkAnthropic Cowork: the compliance exclusion every regulated team must know.

Anthropic offers two product surfaces for Claude: the Claude API (direct, used by Copilot Studio and developer integrations) and Cowork (Anthropic's enterprise UI and orchestration product). These are different products with different compliance postures, and the distinction matters for regulated workloads.

As of May 2026, Anthropic Cowork is explicitly excluded from Audit Logs, the Compliance API, and Data Exports — across all plan tiers including Enterprise. Per analysis of Anthropic's enterprise deployment documentation, workloads requiring SOX, HIPAA, PCI DSS, or SOC 2 audit trails should not route through Cowork. The Compliance API — launched August 20, 2025, enabling streaming audit events to SIEM tools including Splunk, Datadog, and Elastic — applies to the Claude API only. Audit logs retained in the Admin Console default to 30 days; enterprise customers should configure SIEM export before that window closes.

This exclusion applies to the Cowork product specifically. The Claude API itself (the interface used by Copilot Studio's Anthropic model options and by direct API integrations) supports the Compliance API. The practical implication: teams evaluating Claude for regulated computer-use workloads should architect against the Claude API rather than Cowork, and should verify their SIEM export configuration before go-live. Zero Data Retention (ZDR) is also available for computer-use API traffic when the org has a ZDR arrangement — "data sent through this feature is not stored after the API response is returned," per Anthropic's documentation.

Compliance Architecture Note — May 2026

Anthropic computer use through the Claude API is ZDR-eligible (data not stored after API response when ZDR is in place) and Compliance API-eligible (audit events stream to SIEM). Anthropic Cowork — the enterprise UI product — is excluded from all three audit mechanisms as of May 2026: Audit Logs, Compliance API, and Data Exports. Verify product routing before regulated workload go-live.

09Cost ArchitectureCost math for production workloads — Copilot Credits at scale.

Copilot Studio computer-use pricing is Credit-based, not token-based, which makes cost modeling different from direct API pricing. Per Microsoft's licensing documentation: each step in a computer-use run consumes 5 Copilot Credits on standard models (OpenAI Computer-Using Agent, Claude Sonnet 4.5, Claude Sonnet 4.6) or 15 Copilot Credits on the premium model (Claude Opus 4.6). A four-step time-sheet automation example — log in, navigate to time entry, fill form, submit — consumes 20 Credits on standard models or 60 Credits on the premium model.

Scaled to a back-office use case: a 10,000-record data migration where each record requires an average of 8 steps (navigate, locate, read, validate, enter, verify, confirm, log out) consumes 400,000 Copilot Credits on standard models or 1,200,000 Credits on Claude Opus 4.6. Copilot Credits are purchased in pool packs; they do not map 1:1 to a fixed USD amount because pack pricing varies by license tier and volume. Do not convert Credits to USD without sourcing your organization's current Power Platform messaging pack rate from Microsoft. The credit arithmetic above is factual per Microsoft's licensing docs; the USD implication requires a separate commercial conversation.

Cost ceiling controls are non-negotiable at this scale. The checklist's control #21 (cost ceiling alerts in Power Platform admin center) should be configured before production deployment — not after the first runaway batch. Standard-model routing is the default cost posture; premium-model routing (Opus 4.6) should be reserved for tasks where benchmark quality differences justify the 3× Credit cost. For the routing decision between models, see the three-way comparison of computer-use platforms which covers OSWorld benchmarks, including Claude Sonnet 4.5's vendor-reported 61.4% score on the OSWorld real-world task benchmark.

Anthropic direct-API cost is token-based, not Credit-based. The two are not directly comparable in a pricing table without converting to the customer's specific licensing rates. The operational principle is the same: set cost ceilings, alert at 80% of ceiling, and kill-switch at 100%. Cost overruns in agentic workloads are not billing noise — they are frequently the first symptom of a runaway or looping agent that needs the AG-MG.1 kill-switch to fire.

Conclusion

Computer-use governance is the deployment gap — and it is closable with 40 controls.

The vendors have shipped the technology. Microsoft Copilot Studio reached general availability across commercial geographies on May 22, 2026 — with four model options, integrated credentials, session replay, and Purview export. Anthropic's computer-use API has been available since late 2024, with a prompt-injection classifier active by default and ZDR eligibility for regulated orgs. The controls exist. The gap is in stitching them into one operational checklist that a security team can audit and a compliance team can sign off on.

The 40-point checklist in this playbook is that stitch. Every control traces to a primary source: Microsoft Learn, Anthropic docs, OWASP Agentic Top 10, or the CSA NIST AI RMF Agentic Profile. None are invented. The two footguns — access control is not egress control, and HITL is probabilistic not deterministic — are the highest- return items to address first. Configure network-layer egress filtering before you rely on the allowlist. Deploy a kill-switch before you rely on model-triggered escalation. Those two changes close the largest surface area in the shortest time. The remaining 38 controls are the governance program that makes computer-use agents sustainable at enterprise scale.

Deploy computer-use agents safely at enterprise scale

From computer-use pilot to production-grade governance.

We help enterprise teams deploy computer-use agents with the governance layer built in — identity isolation, audit architecture, HITL design, and compliance mapping against NIST AI RMF, OWASP Agentic Top 10, and your regulatory framework.

Free consultationExpert guidanceTailored solutions
What we work on

Enterprise agentic AI governance

  • 40-point guardrails audit and gap analysis
  • Identity and credential isolation architecture
  • HITL checkpoint design and kill-switch implementation
  • Copilot Studio and Anthropic API compliance mapping
  • OWASP Agentic Top 10 and NIST AI RMF alignment
FAQ · Enterprise Computer-Use Agents

Enterprise computer-use: the questions security and compliance teams ask.

The most widely misunderstood gap is that Copilot Studio's Access control allowlist blocks the model from taking actions on off-list websites — but it does not block the model from opening them. An agent can navigate to an off-list URL and process its content even when it cannot interact with it. Egress filtering at the network host layer (Intune policies, Edge policies) is the required complement. This is documented in Microsoft's own Learn docs. The second critical gap: human-in-the-loop escalation is probabilistic, not deterministic. Microsoft explicitly warns that the model may proceed without escalating. Pair HITL with a rule-based deterministic gate for irreversible actions and a kill-switch (NIST AG-MG.1) for highest-severity incidents.