Perplexity Agent API: Build AI Search Into Your Products
Developer guide to Perplexity's Agent API platform. Four APIs for search, agents, embeddings, and sandboxed code execution with enterprise controls.
APIs in the Platform
Deep Research Benchmark
Deep Research Model
Multi-Model Support
Key Takeaways
Perplexity has evolved from an AI-powered search engine into a full developer platform. The expanded API offering includes four distinct APIs — Agent, Search, Embeddings, and the upcoming Sandbox — each targeting a specific layer of the AI search stack. For developers building products that need real-time, web-grounded information retrieval, this platform provides production-grade infrastructure without requiring you to build and maintain your own search pipeline.
This guide covers each API in detail, explains when to use the Agent API versus the Search API, walks through building a search-augmented chatbot, and covers enterprise features including Perplexity Computer and the Multi-model Council. Whether you are integrating AI search into advertising workflows or building internal research tools, understanding the capabilities and tradeoffs of each API is essential for making the right architectural decisions.
Platform Overview: Four APIs, One Ecosystem
The Perplexity API platform is organized around four APIs, each handling a different part of the information retrieval and reasoning pipeline. You can use them independently or compose them together for more complex architectures.
Multi-step workflow orchestration with real-time web retrieval and iterative tool calling. Best for complex queries requiring synthesis across sources.
Single-query web-grounded retrieval for RAG pipelines. Returns responses with source citations. Transparent per-query pricing.
Generate vector embeddings for building search indexes and recommendation systems. Optimized for retrieval at scale.
Secure, isolated code execution environment for AI agents. Write, test, and run code safely within agent workflows.
Building with AI search APIs? Integrating real-time web retrieval into production applications requires careful architecture. Explore our AI and Digital Transformation services for expert guidance on API integration and agent architecture.
The platform is model-agnostic — you are not locked into Perplexity's own models. The Multi-model Council feature lets you compare responses from different models for the same query, enabling quality benchmarking and consensus-based answer generation. For teams evaluating how Perplexity Computer and multi-model agents work in practice, the platform provides the infrastructure to run these comparisons programmatically.
Agent API: Multi-Step Workflows with Web Retrieval
The Agent API is the most capable endpoint in the platform. It orchestrates multi-step workflows where the model can search the web, reason about results, call additional tools, and verify information across multiple turns. Unlike a single search query, the Agent API handles complex tasks that require iterative retrieval — for example, researching a competitor's pricing across multiple sources, cross-referencing data points, and synthesizing a summary with citations.
const response = await fetch("https://api.perplexity.ai/agent", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "sonar-pro",
messages: [
{
role: "system",
content: "You are a research assistant. Verify claims across multiple sources.",
},
{
role: "user",
content: "Compare the pricing tiers of Vercel, Netlify, and Cloudflare Pages as of 2026.",
},
],
// Agent-specific: enable multi-step web retrieval
search_recency_filter: "month",
return_citations: true,
}),
});
const data = await response.json();
// data.choices[0].message.content contains the synthesized response
// data.citations contains source URLs for verificationThe Agent API differs from the Search API in three key ways. First, it supports iterative retrieval — the model can perform multiple web searches within a single request to gather comprehensive information. Second, it combines web-augmented responses with tool calling, allowing you to provide custom functions the agent can invoke alongside web searches. Third, it maintains context across the multi-step workflow, so each subsequent search is informed by the results of previous steps.
Iterative Retrieval
Multiple web searches per request for comprehensive information gathering
Tool Calling
Custom functions alongside web search for hybrid agent workflows
Context Continuity
Each search step builds on previous results for deeper analysis
Search API: Real-Time Grounded Retrieval for RAG
The Search API is the workhorse for RAG (Retrieval-Augmented Generation) pipelines. It takes a single query, retrieves relevant web content in real time, and returns a grounded response with source citations. The key advantage over building your own search pipeline is that Perplexity handles the crawling, indexing, and retrieval infrastructure — you send a query and get back a contextualized response with references.
const response = await fetch("https://api.perplexity.ai/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "sonar",
messages: [
{
role: "user",
content: "What are the latest Next.js 16 features released in 2026?",
},
],
return_citations: true,
search_recency_filter: "week",
}),
});
const data = await response.json();
// Use data.choices[0].message.content as context for your own LLM
// data.citations provides source URLs for transparencyimport requests
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "sonar",
"messages": [
{
"role": "user",
"content": "What are the latest Next.js 16 features released in 2026?",
}
],
"return_citations": True,
"search_recency_filter": "week",
},
)
data = response.json()
# data["choices"][0]["message"]["content"] contains the grounded response
# data["citations"] contains source URLsThe Search API supports several configuration options that affect result quality. The search_recency_filter parameter constrains results to a time window (hour, day, week, month), which is critical for queries about recent events. The return_citations flag includes source URLs in the response, enabling your application to show provenance information to users. For production RAG pipelines, always enable citations — they serve as both a trust signal for users and a debugging tool for developers.
Embeddings API and Sandbox API
The Embeddings API generates vector representations of text for building search indexes, recommendation systems, and semantic similarity features. Unlike the Search API, which retrieves information from the web, the Embeddings API lets you create vector representations of your own data for internal retrieval. This is the building block for hybrid RAG systems that combine Perplexity's web search with your proprietary document search.
const response = await fetch("https://api.perplexity.ai/embeddings", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "sonar-embedding",
input: [
"Perplexity Agent API for multi-step workflows",
"Building RAG pipelines with real-time web search",
"Enterprise AI search integration patterns",
],
}),
});
const data = await response.json();
// data.data[0].embedding — float array for vector storage
// Store in Pinecone, Weaviate, pgvector, or any vector databaseSandbox API (Coming Soon)
The Sandbox API will provide a secure, isolated code execution environment for AI agents. This solves a common problem in agent architectures: allowing the agent to write and execute code without risking the host system. Use cases include data analysis (the agent writes a Python script to process CSV data), automated testing (generating and running test code), and computational verification (checking mathematical claims by running calculations). The Sandbox API will integrate directly with the Agent API, enabling agents to generate and execute code as part of their multi-step workflows.
Architecture pattern: Combine the Search API for real-time web context, the Embeddings API for internal document retrieval, and the Agent API for orchestration. This gives you a hybrid RAG system that draws from both public web data and your proprietary knowledge base. Our guide on AI function calling across providers covers the tool-calling patterns that complement this architecture.
Agent API vs Search API: When to Use Which
The most common question developers face is whether to use the Agent API or the Search API for their use case. The decision depends on query complexity, latency requirements, and cost sensitivity. Here is the decision framework.
| Factor | Search API | Agent API |
|---|---|---|
| Query complexity | Single factual questions | Multi-hop reasoning, comparisons |
| Latency | Lower (single retrieval) | Higher (multiple steps) |
| Cost | Per query | Per orchestration step |
| Tool calling | Not supported | Full tool integration |
| Web searches | Single pass | Iterative, multi-turn |
| Best for | RAG pipelines, fact checking | Research, analysis, complex tasks |
A good rule of thumb: if the query can be answered with a single web search, use the Search API. If the query requires the model to search, evaluate results, and search again based on what it found, use the Agent API. For most RAG pipelines where you are augmenting your own model with real-time web context, the Search API is the right choice. For autonomous research tasks or customer-facing agents that need to handle open-ended questions, the Agent API provides the necessary multi-step capability.
Enterprise Features and Perplexity Computer
Perplexity's enterprise offering goes beyond API access. Perplexity Computer is a multi-model AI agent available to enterprise customers that can execute complex tasks across applications. The Slack integration allows teams to query the Computer agent directly inside channels and threads using the @computer mention, making it accessible without leaving existing workflows.
Perplexity Computer
Multi-model AI agent for complex tasks. Available to enterprise customers. Handles research, data gathering, and multi-step workflows that span multiple applications and data sources.
Slack Integration
Query @computer directly in Slack channels and threads. Teams can delegate research tasks without switching tools. Results appear inline in the conversation with citations.
Admin Controls
Granular feature access, model availability configuration, expanded audit logs, and domain-based sign-up restrictions. Admins control which models and features are available per team.
Deep Research on Opus 4.6
State-of-the-art performance on the Google DeepMind Deep Search QA benchmark. Multi-step retrieval that synthesizes information from dozens of sources into comprehensive research reports.
Multi-model Council
The Multi-model Council feature allows comparing responses from different models for the same query. In practice, this means you can send a single query and receive answers from multiple models side by side. Use cases include quality benchmarking (which model performs best on your specific query types), consensus checking (multiple models agree on a factual claim), and A/B testing new models before switching production traffic. Enterprise customers can configure which models are available, controlling both cost and capability exposure.
Pricing, Rate Limits, and Authentication
Authentication with the Perplexity API uses Bearer tokens. Generate an API key from the Perplexity API dashboard and include it in the Authorization header of every request. API keys are scoped to your account and can be rotated without downtime.
// Store API key in environment variables — never in source code
const PERPLEXITY_API_KEY = process.env.PERPLEXITY_API_KEY;
// All requests include the Authorization header
const headers = {
"Authorization": `Bearer ${PERPLEXITY_API_KEY}`,
"Content-Type": "application/json",
};
// Error handling for auth failures
async function queryPerplexity(body: object) {
const response = await fetch("https://api.perplexity.ai/chat/completions", {
method: "POST",
headers,
body: JSON.stringify(body),
});
if (response.status === 401) {
throw new Error("Invalid API key. Check your PERPLEXITY_API_KEY.");
}
if (response.status === 429) {
const retryAfter = response.headers.get("Retry-After");
throw new Error(`Rate limited. Retry after ${retryAfter} seconds.`);
}
if (!response.ok) {
throw new Error(`Perplexity API error: ${response.status}`);
}
return response.json();
}Rate Limits and Error Handling
Rate limits vary by API endpoint and subscription tier. Free-tier and developer accounts have lower request-per-minute limits, while Pro and enterprise tiers have significantly higher throughput. The Search API and Agent API maintain separate rate limit pools, so hitting a limit on one does not affect the other. Always implement exponential backoff in production clients — the API returns a 429 status code with a Retry-After header when you exceed limits.
Store keys in environment variables
Never hardcode API keys in source code. Use process.env in Node.js or .env files with your framework's built-in env loading. Rotate keys periodically and immediately if compromised.
Implement exponential backoff
When receiving 429 responses, wait the duration specified in the Retry-After header before retrying. Double the wait time on each subsequent retry, with a maximum of 60 seconds. This prevents cascading failures during traffic spikes.
Use separate error handling for each API
The Agent API, Search API, and Embeddings API return different error formats. Validate response status codes and parse error messages per endpoint. Log failed requests with the full response body for debugging.
Building a Search-Augmented Chatbot
One of the most common use cases for the Perplexity API is building a chatbot that can answer questions with real-time web data. The pattern is straightforward: user asks a question, your server calls the Search API for grounded context, and your own model generates the final response incorporating that context. This gives you control over the response format and personality while leveraging Perplexity's search infrastructure for factual grounding.
import OpenAI from "openai";
const openai = new OpenAI();
const PERPLEXITY_KEY = process.env.PERPLEXITY_API_KEY;
async function searchAugmentedChat(userQuery: string) {
// Step 1: Get web-grounded context from Perplexity
const searchResponse = await fetch(
"https://api.perplexity.ai/chat/completions",
{
method: "POST",
headers: {
"Authorization": `Bearer ${PERPLEXITY_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "sonar",
messages: [{ role: "user", content: userQuery }],
return_citations: true,
}),
},
);
const searchData = await searchResponse.json();
const webContext = searchData.choices[0].message.content;
const citations = searchData.citations || [];
// Step 2: Generate final response with your own model
const completion = await openai.chat.completions.create({
model: "gpt-4.1-mini",
messages: [
{
role: "system",
content: `Answer the user's question using this web context.
Cite sources using [1], [2] notation.
Web context:
${webContext}
Sources:
${citations.map((url: string, i: number) => `[${i + 1}] ${url}`).join("\n")}`,
},
{ role: "user", content: userQuery },
],
});
return {
answer: completion.choices[0].message.content,
sources: citations,
};
}This pattern separates search from generation. Perplexity handles the web retrieval, and your own model handles the response formatting. The benefits are clear: you maintain full control over the response style, can swap your generation model independently from the search provider, and citations provide transparency for users. For developers working with TypeScript AI agent architectures, the Search API integrates naturally as a tool that the agent can invoke when it needs real-time information.
Production tip: Cache Search API responses for identical queries within a reasonable time window (5-15 minutes for most use cases). This reduces API costs and improves latency for popular queries without sacrificing freshness for time-sensitive information.
Conclusion
Perplexity's expanded API platform fills a specific gap in the AI infrastructure landscape: production-grade, real-time web search that developers can integrate without building their own crawling and retrieval pipeline. The four-API architecture — Agent for orchestration, Search for single-query retrieval, Embeddings for vector indexing, and Sandbox for code execution — covers the full stack from simple RAG augmentation to autonomous research agents.
For most developers, the Search API is the starting point. It requires minimal integration effort and immediately adds web-grounded context to any LLM application. The Agent API is where the platform differentiates itself for complex use cases that require multi-step reasoning and tool integration. Combined with enterprise features like Perplexity Computer, Slack integration, and granular admin controls, the platform scales from individual developer projects to organization-wide AI infrastructure. The model-agnostic approach and Multi-model Council ensure you are never locked into a single model provider as capabilities and pricing evolve.
Ready to Build AI-Powered Search?
Real-time web retrieval transforms what your AI applications can do. Our team helps companies design and implement search-augmented architectures that scale with your needs.
Frequently Asked Questions
Related Articles
Continue exploring with these related guides