Agentjacking is a newly disclosed attack class that hijacks AI coding agents — Claude Code, Cursor, and Codex among them — by hiding malicious instructions inside data the agent already trusts. A developer asks the agent to investigate an error; the agent pulls the error details through a Model Context Protocol (MCP) connection; and the attacker’s payload, planted inside that error, runs as if it were the agent’s own idea.
The research comes from Tenet Threat Labs, which coined the term and published the disclosure in June 2026. Their controlled testing reports a roughly 85% exploitation success rate across the major agents. That figure is Tenet’s own measurement — corroborated by several security outlets reporting the same number, but not independently replicated — so we treat it as vendor-stated throughout. What is not in dispute is the mechanism, and the mechanism is the part worth understanding.
This guide explains what agentjacking is, why conventional security tooling misses it entirely, who is exposed, why the vendor at the center of the disclosure declined to fix it, and — most usefully — five concrete guardrails you can apply today, mapped to the specific stage of the attack each one disrupts. We end with a short pre-run self-check you can adopt before your next agent touches a repository.
- 01The injection rides on data, not on a breach.A fake error event carries hidden instructions. The agent retrieves it through MCP and cannot reliably tell the difference between data to read and a command to run — so it runs it.
- 02Prompt-layer warnings did not stop it.Tenet reports that agents executed the payload even when their system prompts explicitly told them to disregard untrusted external data. A warning in the prompt is not a control.
- 03Every step is technically authorized.Tenet calls this the Authorized Intent Chain. Real credentials, legitimate tools, no malware dropped. EDR, WAF, IAM, VPN, and firewalls see nothing wrong because nothing is unauthorized.
- 04The exposure is broad and the vendor declined a fix.Tenet reports 2,388 organizations with publicly exposed Sentry DSNs and 100+ confirmed executions. Sentry called it technically not defensible and applied only a string filter for the proof-of-concept.
- 05Defense belongs at the agent, not the prompt.Audit MCP connections, rotate or proxy DSNs, sandbox execution, require human approval for shell commands, and deploy hardened configs. Map each control to a specific attack stage.
01 — The AttackA fake bug report that the agent treats as instructions.
Start with the workflow every developer using an AI agent already has: something breaks, you ask the agent to look into it, and the agent queries your error-tracking service to read the details. Tools like AI coding agents including Claude Code, Cursor, and Codex do this through MCP — the protocol that lets an agent call out to external services for context. The agent reads the error, proposes a fix, and often offers to run it.
Agentjacking abuses that loop. According to Tenet, the attacker sends a crafted error event to your Sentry project and embeds malicious instructions inside the event’s message field — formatted as a fake ## Resolution markdown section so it mimics the structure of legitimate Sentry MCP output. When the agent later retrieves that event, it sees what looks like an authoritative suggested fix and acts on it. The payload in Tenet’s proof-of-concept directed agents to run a single npm command that then probed the local machine for AWS credentials, npm tokens, Docker credentials, SSH keys, and git credential helpers.
The entry point is a Sentry DSN — the Data Source Name. A DSN is by design a public, write-only credential: it is embedded in frontend JavaScript so a visitor’s browser can report errors. That makes it trivially discoverable to anyone who can view a page’s source. Tenet found exposed DSNs using only public Sentry APIs and GitHub code search — no authentication, no breach. The design that makes Sentry easy to wire up is the same design that makes it injectable.
Find the DSN
A Sentry DSN is a public, write-only key embedded in frontend JavaScript. Anyone who can read a page's source can find it. Tenet located thousands without authentication.
Plant the payload
The attacker sends a crafted error event. Malicious instructions sit in the message field, formatted to mimic a legitimate Sentry MCP resolution template the agent already trusts.
Agent runs the command
When the developer asks the agent to investigate, it retrieves the event, reads the injected fix, and executes it — probing for AWS keys, npm tokens, SSH keys, and git credentials.
02 — Why Detection FailsThe Authorized Intent Chain: every step is permitted.
The reason agentjacking is dangerous is not that it is clever at the injection point — prompt injection is well documented. It is that the entire attack chain is, technically, authorized. Tenet names this the Authorized Intent Chain: the developer asks the agent to fix errors, the agent queries Sentry via MCP, MCP returns data, and the agent runs the suggested fix. No single step is unauthorized. The prevailing security model is built to catch unauthorized behavior — and this attack contains none.
That single property is what makes the attack pass straight through the controls most teams rely on. Per the reporting, agentjacking bypasses EDR, WAF, IAM systems, VPN, Cloudflare, and firewall controls — not by defeating them, but by never doing anything they are designed to flag. The agent is using legitimate credentials to execute legitimate tools. No malware is dropped. No policy is violated. There is no anomalous login, no exfiltration signature, no unsigned binary.
This reframes the entire defense problem. If your mental model of security is “detect the bad thing happening,” you have nothing to detect. The interpretation worth sitting with: the same properties that make agents productive — broad tool access, standing credentials, the autonomy to act on what they read — are exactly the properties an attacker borrows. You cannot harden the agent by making it more obedient to the data it reads. You harden it by constraining what it is allowed to do when it acts.
The most uncomfortable finding sharpens the point: prompt-layer defenses did not work. Tenet reports that agents executed the attacker’s payload even when their system prompts explicitly instructed them to disregard untrusted external data. The popular advice to “just add a warning to your system prompt” is, on this evidence, not a control at all. The same reasoning applies across the agent ecosystem we cover in our look at AI coding agents including Claude Code, Cursor, and Codex: every one of them can be configured to read external context, and external context is where the instruction hides.
03 — The ScaleWho is exposed — and how far the footprint reaches.
Tenet’s exposure numbers are self-reported and we present them as such. During a validation period that ended June 17, 2026, the team reports identifying 2,388 organizations with injectable, publicly exposed Sentry DSNs — 71 of which rank within the Tranco top-one-million global websites. More than 100 real-world organizations had AI agents actually execute the researchers’ controlled validation payload, spanning Fortune 500 enterprises, hosting providers, scientific computing firms, cloud security vendors, and startups across FinTech, EdTech, and HealthTech.
The single most striking case — again, vendor-stated, with the company deliberately unnamed — is a Fortune 100 technology company valued at approximately $250 billion whose AI coding agents on corporate Windows devices confirmed execution of the payload, with cloud infrastructure tokens and git tokens accessible. Tenet reports confirmed victim environments spanning macOS, Windows (including WSL Ubuntu), CI/CD pipelines, sandboxed agents, VPN-protected internal networks, and GCP/AWS cloud containers, across six continents and more than 30 countries.
Agentjacking exposure footprint · as reported by Tenet
Source: Tenet Threat Labs disclosure, June 2026 — figures vendor-stated, not independently replicatedTreat these bars as a magnitude indicator rather than an audited census. The bar widths are an illustrative visual scaling of the reported figures, not a shared denominator — 2,388 and 100+ measure different things (exposed versus confirmed-executed). The honest takeaway is directional: the exposed population is large, real executions occurred at well-resourced organizations, and the geographic spread means this is a global pattern rather than a regional or sector-specific one. Notably, this lands against a wider backdrop in which 1 in 8 enterprise breaches now involve agentic systems — agentjacking is one named instance of a category that is already showing up in breach statistics.
04 — The Vendor ResponseWhy Sentry called it not defensible — and what that means for you.
Tenet disclosed the issue to Sentry on June 3, 2026. Per the reporting, Sentry acknowledged it the same day but declined to fix it at the root cause. Sentry characterized the attack as technically not defensible at its platform level — meaning it would not restrict event ingestion to authenticated sources or sanitize event data before returning it through the MCP server. Its stated rationale was that model vendors run middleware defenses. The only remediation Sentry took was to activate a global content filter for the specific payload string in Tenet’s proof-of-concept.
Be precise about what that means, because it is easy to misread. Sentry did not patch the vulnerability. It blocked one known exploit string while leaving the architectural pathway — untrusted event data flowing through MCP into an agent that will act on it — entirely open. A different payload, a different phrasing, the same outcome. This is an unusual disclosure precisely because the vendor at the center of it took the position that the defense belongs somewhere else.
Here we will take a frank editorial stance: whatever the merits of Sentry’s argument that model vendors should run middleware defenses, the practical effect is that responsibility lands on you, the developer or platform team. You cannot wait for a patch that the platform vendor has said it will not ship. That is not a reason to stop using Sentry or MCP — both remain genuinely useful — but it is the reason the rest of this guide focuses on controls you own rather than fixes you are waiting on.
05 — Attack SurfaceThe agents in scope — and the shared weakness.
Tenet reports that the affected tools include Claude Code, Cursor, OpenAI Codex, Warp terminal agents, and VS Code extensions — any agent that can be configured to query Sentry (or another external service) over MCP. The common thread is not a flaw unique to any one product; it is the shared design where an agent reads external context and is empowered to act on it. The same loop powers always-on Cursor automations and agentic coding agents, which raises the stakes: an always-on agent retrieving external data without a human in the loop removes the one moment a person might have paused to question a suspicious “fix.”
The defensive controls a given agent ships with matter, but Tenet’s broader point is that you should not assume any default is safe. Several confirmed victim environments were already sandboxed, and the attack still succeeded — sandboxing helps, but only when it is configured to deny outbound network access by default, which most defaults do not. The three capabilities that actually move the needle are deny-by-default network egress, an explicit content-trust boundary for tool output, and a human-approval gate before shell execution.
Agent families affected
Claude Code, Cursor, OpenAI Codex, Warp terminal agents, and VS Code extensions — every agent configurable to query an external service over MCP. The weakness is the pattern, not the product.
Reliable protection from prompt text
Agents executed the payload even when system prompts told them to ignore untrusted external data. A warning in the prompt is not a control — treat it as documentation, not defense.
agent-jackstop drop-in configs
Tenet open-sourced configs for Cursor and Claude Code: deny-by-default network egress, macOS/Linux/WSL2 sandboxing, an Auto-review classifier, and an Allowlist-plus-Sandbox mode.
06 — DefenseFive guardrails, mapped to the stage each one disrupts.
Most coverage gives you a flat list of advice. The more useful frame is to map each control to the attack stage it interrupts — Discovery, Injection, Retrieval, Execution, or Exfiltration — and to note whether it works without waiting on the agent vendor or on Sentry. The matrix below does exactly that. The five guardrails are practical recommendations drawn from Tenet’s agent-jackstop documentation, the Cloud Security Alliance research note, and general security practice; none of them is a claim about any product’s default behavior.
| Guardrail | Stage disrupted | No agent change | No Sentry change | Independence | Complexity | Cost |
|---|---|---|---|---|---|---|
| Five guardrails · independence = count of “Yes” across the two no-change columns (0–2) | ||||||
| Audit MCP connections | Retrieval | Yes | Yes | 2 / 2 | Low | Free |
| Rotate / proxy Sentry DSNs | Discovery · Injection | Yes | No | 1 / 2 | Medium | Free |
| Sandbox agent execution | Execution · Exfiltration | No | Yes | 1 / 2 | Medium | Free |
| Require approval for shell commands | Execution | No | Yes | 1 / 2 | Low | Free |
| Deploy agent-jackstop configs | Execution · Exfiltration | No | Yes | 1 / 2 | Medium | Free (open source) |
Read the independence column as a prioritization signal, not a ranking of effectiveness. Auditing your MCP connections scores 2 of 2 — it needs no change from either the agent vendor or Sentry, so you can do it immediately. Rotating or proxying DSNs scores 1 because it touches your Sentry configuration; sandboxing, approval gates, and agent-jackstop each score 1 because they change how the agent runs. The controls that score lower are not weaker — sandboxing and approval gates are the ones that actually stop execution — they simply require you to change the agent’s configuration, which is well within your control.
Audit MCP & rotate DSNs
Inventory every MCP connection and ask which surface externally-controlled content. Rotate or proxy exposed Sentry DSNs so a leaked public key cannot be used to inject events. This shrinks the attack surface before the agent ever reads anything.
Sandbox & gate execution
Run agents in a sandbox with deny-by-default network egress, require explicit human approval before any shell command, and deploy hardened configs such as agent-jackstop. This is where execution and exfiltration are actually stopped.
Rotate secrets & keep audit logs
Assume any credential reachable by a hijacked agent is exposed: rotate AWS keys, git tokens, and npm tokens, and retain agent action logs so you can reconstruct what a compromised run touched. Recovery, not prevention — but essential when prevention fails.
Treat all external context as untrusted
Apply the same posture to issue trackers, support queues, code-review platforms, and log aggregators — every MCP source that surfaces content a stranger can influence. Sentry is the disclosed instance, not the boundary of the risk.
Tenet open-sourced agent-jackstop — drop-in configurations for Cursor and Claude Code that harden agents against prompt injection through untrusted tool output. Per its documentation, it provides deny-by-default network egress, macOS/Linux/WSL2 sandboxing, an Auto-review classifier, and an Allowlist mode paired with a sandbox. It is available on GitHub under tenet-security/agent-jackstop. It is one option among several; the principle it embodies — constrain the agent at the point of action — is the durable part, whichever tool you use to enforce it.
Looking forward, expect the defensive center of gravity to keep shifting toward runtime reasoning monitoring rather than input or output filtering. The broader research direction points that way: controls that inspect what an agent is about to do, at the moment it decides to do it, are more durable than trying to scrub every possible malicious input. If your team is standing up agents in production and wants this baked in from the start, our AI & digital transformation engagements and web development practice build the sandboxing, egress controls, and approval gates into the agent architecture rather than bolting them on after an incident.
07 — Self-CheckRun this before your next agent touches a repo.
None of these steps requires waiting on a vendor, and all five are free. Treat them as the minimum bar for any team running AI coding agents against repositories that hold real credentials.
Inventory MCP connections
List every MCP server your agents can reach. Flag any that returns content a third party can influence — error trackers, issue boards, support tickets, log streams. Those are the injection surfaces.
Rotate or proxy exposed DSNs
Treat any DSN in your frontend as discoverable. Rotate it, and where possible route error ingestion through a proxy that drops or sanitizes events from unauthenticated sources.
Sandbox with deny-by-default egress
Run agents in a sandbox that denies outbound network access unless explicitly allowed. Sandboxing alone was not enough in confirmed cases — the egress rule is the part that matters.
Require approval for shell commands
Gate any shell command behind explicit human approval. This is the moment a person can question a suspicious 'fix' — the one chokepoint prompt warnings could not provide.
Deploy hardened configs & log everything
Apply a hardened config such as agent-jackstop, and retain agent action logs. If a run is ever compromised, the logs are how you scope exactly which secrets to rotate.
08 — ConclusionThe defense moves to where the agent acts.
When every step is authorized, you cannot detect your way out — you have to constrain the action.
Agentjacking is a clean illustration of a category that is only going to grow: attacks that weaponize an agent’s own legitimate behavior. There is no malware to catch, no unauthorized login to flag, no policy violation to alert on. A fake error report carries an instruction, the agent reads it as guidance, and it runs the command with your credentials. The reported numbers — roughly 85% success, 2,388 exposed organizations, 100-plus confirmed executions — are Tenet’s own, and we have treated them as such throughout. The mechanism, which is what matters, is not in dispute.
The single most important lesson is that prompt-layer defenses do not hold. Agents executed the payload even when told to ignore untrusted data. That finding alone retires the most common piece of advice and forces the real work to where it belongs: auditing MCP connections, rotating exposed DSNs, sandboxing with deny-by-default egress, requiring human approval before execution, and treating every external context source as untrusted — not just Sentry.
Sentry declined to fix the root cause, which means the responsibility is already yours. That is not cause for alarm so much as a prompt to act: the five guardrails in this guide are free, most can be applied today, and together they cut your real exposure without giving up the productivity that made you adopt agents in the first place. Build the constraints into the agent architecture now, and an incident becomes a contained event instead of an open-ended one.