AI Development13 min read

OpenAI Safety Bug Bounty: AI Agent Security Guide 2026

OpenAI launches dedicated safety bug bounty for AI agent vulnerabilities. Covers prompt injection, data exfiltration, and autonomous action risks.

Digital Applied Team
March 27, 2026
13 min read
Mar 25

Launch Date

$100K

Max Bounty

50%

Repro Threshold

5x

Bounty Increase

Key Takeaways

OpenAI launched a dedicated safety bug bounty separate from its security program: Announced on March 25, 2026, the OpenAI Safety Bug Bounty is a new program hosted on Bugcrowd that specifically targets AI abuse and safety risks in agentic products. Unlike the existing security bounty that covers traditional vulnerabilities like authentication bypasses and injection flaws, this program focuses on AI-specific risks including prompt injection, data exfiltration through agents, and harmful autonomous actions.
Prompt injection and data exfiltration in agentic products are the primary targets: The program prioritizes third-party prompt injection attacks that can reliably hijack a victim's agent, including ChatGPT Agent, Browser, and similar agentic products, to trick it into performing harmful actions or leaking sensitive user information. Submissions must demonstrate reproducibility at least 50 percent of the time to qualify.
Simple jailbreaks are explicitly out of scope: General content-policy bypasses without demonstrable safety or abuse impact do not qualify for the safety bounty. Getting a model to use rude language or return easily searchable information is not considered a safety vulnerability under this program. The bar is set at real-world harm potential with actionable, discrete remediation steps.
Maximum bounty payouts have increased to $100,000 for critical findings: OpenAI raised the maximum bug bounty payout for exceptional and differentiated critical findings to $100,000, up from the previous $20,000 cap. This five-fold increase reflects the growing importance of securing agentic AI systems and the difficulty of discovering high-impact safety vulnerabilities.
Businesses deploying AI agents need to adopt security practices that match these threat models: The categories covered by the safety bounty, including prompt injection, data exfiltration, and unauthorized autonomous actions, represent the same risks that enterprises face when deploying AI agents internally. The program effectively defines the threat taxonomy that every organization using agentic AI should be testing against.

AI agents are no longer experimental curiosities. They browse the web, execute code, manage files, and take autonomous actions on behalf of users. That capability shift creates an entirely new attack surface that traditional security testing was never designed to cover. On March 25, 2026, OpenAI acknowledged this gap by launching a dedicated Safety Bug Bounty program, separate from its existing security bounty, that specifically targets vulnerabilities in agentic AI products.

The program is hosted on Bugcrowd and covers ChatGPT Agent, Browser, and other agentic products. Its scope includes prompt injection attacks that hijack agent behavior, data exfiltration through manipulated agent workflows, and autonomous actions that cause real-world harm. Simple jailbreaks are explicitly excluded. For businesses deploying AI agents, the vulnerability categories targeted by this bounty effectively define the threat model that every organization should be testing against. This guide covers what the program includes, how it differs from traditional security bounties, and what it means for AI and digital transformation strategies that rely on autonomous agents.

What Is the OpenAI Safety Bug Bounty

The OpenAI Safety Bug Bounty is a public program that rewards security researchers for identifying AI abuse and safety risks across OpenAI's product suite. Announced on March 25, 2026, the program marks a deliberate expansion of how OpenAI approaches vulnerability discovery. Rather than limiting the bounty to traditional application security flaws, the safety bounty accepts submissions for issues that pose meaningful abuse and safety risks even when they do not meet the technical criteria for a conventional security vulnerability.

Safety Focus

Targets AI-specific safety risks that emerge from agentic behavior, not traditional application security flaws. Designed to catch vulnerabilities that standard penetration testing misses entirely.

Bugcrowd Platform

Hosted on Bugcrowd with a dedicated portal for submissions. Researchers apply through the platform and submit structured reports with reproduction steps and impact evidence.

Agentic Products

Covers ChatGPT Agent, Browser, and similar agentic products that perform autonomous actions on behalf of users, including web browsing, code execution, and file management.

The program reflects a growing recognition across the AI industry that agentic systems introduce risks that do not fit neatly into existing vulnerability taxonomies. A prompt injection attack that causes an agent to exfiltrate user data is not a SQL injection, not a cross-site scripting flaw, and not a misconfigured API endpoint. It is a category of vulnerability that requires its own testing methodology, its own reporting framework, and its own severity classification. The safety bounty creates that framework.

Safety Bounty vs Security Bounty

OpenAI has operated a security bug bounty since April 2023, also hosted on Bugcrowd, that covers traditional application security vulnerabilities across its products and infrastructure. The safety bounty is a separate, complementary program. Understanding the distinction is important because the same researcher action, such as testing how an agent responds to crafted input, could qualify for one program or the other depending on the type of impact demonstrated.

Key Differences Between the Two Programs
DimensionSecurity Bug BountySafety Bug Bounty
FocusApplication security flawsAI abuse and safety risks
ExamplesAuth bypass, XSS, SSRFPrompt injection, data exfiltration
Impact criteriaTechnical vulnerabilityReal-world harm potential
JailbreaksNot applicableOut of scope (no harm path)
Max payout$100,000 (critical)Case-by-case (new program)

The practical implication is that a researcher who discovers a way to make ChatGPT Agent exfiltrate a user's browsing history through a crafted webpage would submit that to the safety bounty, not the security bounty, because the vulnerability is rooted in the agent's behavior rather than in a traditional application flaw. Conversely, a researcher who finds a server-side request forgery in the API would submit to the security bounty. Some findings may span both categories, and OpenAI evaluates those on a case-by-case basis.

Scope: Prompt Injection, Data Exfiltration, Autonomous Actions

The safety bounty defines its scope around three primary attack categories that represent the most consequential risks in agentic AI systems. Each category targets a different phase of how an AI agent processes input, makes decisions, and takes actions.

Third-Party Prompt Injection

When attacker-controlled text, embedded in a webpage, document, or data source, is able to reliably hijack a victim's agent and redirect its behavior. This includes scenarios where the injected text causes the agent to perform actions the user did not intend or authorize, such as visiting malicious URLs, submitting forms, or modifying files.

Reproducibility requirement: Must succeed at least 50% of the time across test runs.

Data Exfiltration Through Agents

When an attacker can trick an agent into leaking the user's sensitive information, such as browsing data, conversation history, file contents, or credentials, to an attacker-controlled destination. This category covers both direct exfiltration (agent sends data to a URL) and indirect exfiltration (agent includes sensitive data in outputs that become publicly accessible).

Covers ChatGPT Agent, Browser, and similar products with network access.

Disallowed Autonomous Actions at Scale

When agentic products perform disallowed actions on OpenAI's website at scale, expose proprietary information related to model reasoning, or bypass anti-automation controls and account trust signals. This covers abuse scenarios where agents are weaponized to perform coordinated harmful actions.

Includes bypasses of rate limits, trust signals, and anti-automation systems.

Bugcrowd Program Details and Rewards

The safety bounty is hosted on Bugcrowd, the same platform that hosts OpenAI's existing security bug bounty. This decision provides a familiar submission workflow for researchers already active in the security bounty and gives OpenAI a structured triage pipeline for evaluating safety-specific reports. The Bugcrowd portal for the safety bounty is separate from the security bounty portal, ensuring submissions are routed to the appropriate review team.

Program Structure
1

Apply via Bugcrowd

Researchers apply through the dedicated OpenAI Safety Bug Bounty portal on Bugcrowd. Acceptance grants access to submit reports and view program scope details.

2

Submit Structured Reports

Reports must include clear reproduction steps, evidence of impact, and demonstration of at least 50% reproducibility. Submissions without actionable remediation paths are less likely to qualify.

3

OpenAI Triage and Evaluation

The safety team evaluates submissions against the program scope, assesses severity based on real-world harm potential, and determines reward amounts on a case-by-case basis.

4

Reward and Remediation

Valid findings receive bounty payouts up to $100,000 for exceptional critical discoveries. OpenAI uses findings to improve safety mitigations across its agentic product line.

The $100,000 maximum payout, increased from the previous $20,000 cap, signals the strategic importance OpenAI places on securing agentic systems. For context, this is five times the previous maximum and positions the program competitively with bug bounties run by major technology companies for critical infrastructure vulnerabilities. The reward level is calibrated to attract experienced security researchers who might otherwise focus on traditional web application testing, and to compensate for the additional complexity involved in constructing reproducible agent-level exploits.

Why Agentic AI Needs Specialized Security

Traditional security testing assumes a clear boundary between user input and system behavior. Input validation, output encoding, and access control are designed around the principle that untrusted data should never influence system behavior directly. AI agents break this assumption fundamentally. An agent's entire purpose is to take user input, including natural language that cannot be fully validated or sanitized, and use it to drive autonomous actions across external systems.

This is why OpenAI created a separate program rather than expanding the scope of the existing security bounty. The threat model for agentic AI is categorically different from the threat model for a web application or API. When an agent browses the web, every page it visits is a potential attack vector. When an agent reads files, every file is a potential injection surface. When an agent executes code, every instruction is a potential path to unauthorized system access. The attack surface is the entire environment the agent can perceive and act upon, which is far larger than the attack surface of a traditional application. As noted in our analysis of how 80 percent of enterprise apps will embed AI agents by 2026, this expanding surface area makes specialized security testing an operational necessity rather than an academic exercise.

Expanded Attack Surface

Traditional apps have defined input points. AI agents treat every document, webpage, email, and file they process as potential input, creating an attack surface that scales with the agent's capabilities.

Cross-System Impact

A compromised agent does not just leak data from one system. It can use its existing permissions and tool access to propagate actions across multiple connected services, amplifying the blast radius of a single exploit.

Non-Deterministic Behavior

AI agents are probabilistic systems. The same input can produce different outputs, making vulnerability reproduction more complex than traditional exploits. This is why the bounty requires 50% reproducibility rather than 100%.

Trust Boundary Ambiguity

In a traditional app, the trust boundary is at the API layer. In an agentic system, the agent itself straddles the trust boundary, processing untrusted input and executing privileged actions in the same execution context.

Common AI Agent Vulnerabilities

The vulnerability categories targeted by the OpenAI Safety Bug Bounty map directly to the most common attack patterns observed in agentic AI systems across the industry. Understanding these patterns is essential for both security researchers looking to participate in the bounty and for businesses looking to secure their own AI agent deployments.

Indirect Prompt Injection

Malicious instructions embedded in content the agent processes, such as hidden text in webpages, metadata in documents, or crafted responses from APIs. The agent treats the injected instructions as part of its task context and executes them. This is the most prevalent attack vector against web-browsing agents and the primary focus of the safety bounty.

Confused Deputy Attacks

An attacker leverages the agent's permissions to perform actions the attacker could not perform directly. For example, tricking an agent with file system access into reading and transmitting contents of files the attacker has no direct access to. The agent acts as a confused deputy, using its legitimate permissions to serve the attacker's goals.

Credential and Session Theft

Manipulating an agent into exposing authentication tokens, session cookies, API keys, or other credentials that are accessible within its execution environment. This is particularly dangerous for agents that operate with persistent sessions across multiple services.

Autonomous Action Escalation

Crafting inputs that cause an agent to take increasingly consequential actions beyond its intended scope. An agent designed to summarize web content might be manipulated into submitting forms, making purchases, or modifying account settings. Developers who manage autonomous permission decisions in AI coding tools will recognize this pattern as the same risk that permission tier systems are designed to contain.

Multi-Step Exfiltration Chains

Complex attack sequences where the agent is first instructed to gather information from internal sources, then directed to transmit that information to an external endpoint. Each individual step may appear benign in isolation, making detection difficult without monitoring the full chain of agent actions.

Security Checklist for Businesses Deploying Agents

The vulnerability categories targeted by the OpenAI Safety Bug Bounty provide a practical framework for any organization deploying AI agents. If OpenAI considers these risks serious enough to pay researchers up to $100,000 to find them, businesses should be investing in mitigating the same risks in their own deployments.

AI Agent Security Checklist
  • Implement input sanitization layers: Filter or flag content the agent processes for known injection patterns before it enters the agent's context window. This includes web content, uploaded documents, and API responses.
  • Apply least-privilege permissions: Restrict agent access to the minimum tools, files, and APIs required for each specific task. Never grant blanket access to all available resources.
  • Monitor and log all agent actions: Record every tool call, API request, and file operation the agent performs. Implement anomaly detection that flags unusual patterns such as data being sent to unexpected endpoints.
  • Require human approval for high-risk actions: Define a tier of actions that always require human confirmation: financial transactions, data deletion, external communications, and changes to security settings.
  • Conduct regular red-team testing: Simulate the attack vectors covered by the OpenAI Safety Bug Bounty against your own agent deployments. Test for prompt injection, data exfiltration, and action escalation on a recurring basis.
  • Isolate agent execution environments: Run agents in sandboxed environments with restricted network access, limited file system scope, and no direct access to production databases or credential stores.
  • Implement output filtering: Review and filter agent outputs before they are delivered to users or passed to downstream systems. Prevent the agent from including sensitive data, internal system details, or executable instructions in its responses.

These practices are not theoretical recommendations. They map directly to the attack vectors that OpenAI is paying researchers to discover. Organizations that implement this checklist before a vulnerability is found are in a fundamentally stronger security position than those that react after an incident. For teams building custom agent systems, our web development services include security architecture reviews specifically designed for AI-integrated applications.

How to Participate in the Program

Security researchers interested in the OpenAI Safety Bug Bounty can apply through the dedicated Bugcrowd portal. The application process is straightforward, but producing a qualifying submission requires understanding both the program scope and the technical methodology for constructing reproducible agent-level exploits.

Submission Requirements
  • Clear step-by-step reproduction instructions
  • Evidence of at least 50% reproducibility
  • Demonstration of real-world harm potential
  • Actionable remediation recommendations
  • Affected product and version identification
What Strengthens a Report
  • Multiple attack variants demonstrating the root cause
  • Impact assessment across different user scenarios
  • Bypass of existing safety mitigations
  • Chain of exploitation from initial access to harm
  • Comparison with known attack patterns

The 50 percent reproducibility threshold acknowledges the probabilistic nature of AI systems while still requiring that findings represent genuine, exploitable vulnerabilities rather than one-off anomalies. Researchers familiar with traditional web application testing should expect a methodological shift: agent-level testing requires constructing adversarial content that reliably influences agent behavior across multiple independent sessions, not just finding a single code path that produces an error.

Best Practices for AI Agent Security

Whether you are a security researcher participating in the bounty or a business securing your own AI agent deployments, the following practices represent the current state of the art for agentic AI security. These principles apply across all agentic systems, not just OpenAI products.

Defense in Depth

Never rely on a single security layer. Combine input validation, permission restrictions, action monitoring, and output filtering. Each layer catches different attack patterns, and the combination provides resilience against novel exploits that bypass individual controls.

Zero Trust for Agent Actions

Treat every agent action as potentially compromised. Verify actions against expected patterns before execution, validate targets against allowlists, and require escalation for any action that deviates from the expected task scope.

Continuous Security Testing

AI models change with each update, and new attack vectors emerge regularly. Integrate agent security testing into your CI/CD pipeline and schedule recurring red-team exercises that test against the latest known attack patterns.

Incident Response Planning

Have a documented plan for responding to agent compromise. Include procedures for immediate agent shutdown, session revocation, audit log review, affected user notification, and root cause analysis. The speed of agent actions means response time is critical.

The OpenAI Safety Bug Bounty represents an industry-defining moment for AI agent security. By creating a separate program specifically for agentic safety risks, OpenAI is formalizing a vulnerability taxonomy that every organization deploying AI agents should adopt. The companies that treat these categories as a checklist for their own security programs will be significantly better prepared than those that wait for incidents to drive their security posture.

Conclusion

The launch of the OpenAI Safety Bug Bounty on March 25, 2026, marks a turning point in how the AI industry approaches security for agentic systems. By creating a dedicated program that targets prompt injection, data exfiltration, and harmful autonomous actions, OpenAI is acknowledging that traditional security testing is insufficient for the new generation of AI agents that browse the web, execute code, and take autonomous actions on behalf of users.

For security researchers, the program offers a structured path to contribute to AI safety with meaningful financial incentives. For businesses deploying AI agents, the bounty's scope provides a ready-made threat model that should inform security testing, architecture decisions, and incident response planning. The vulnerability categories are not hypothetical. They are the same risks that every organization running agentic AI faces today.

Secure Your AI Agent Strategy

Agentic AI creates powerful business capabilities and new security challenges. Our team helps organizations design, implement, and secure AI-integrated systems that deliver results without introducing unacceptable risk.

Free consultation
Expert guidance
Tailored solutions

Related Articles

Continue exploring with these related guides