OpenAI Agents SDK vs LangGraph vs CrewAI: 2026 Matrix
OpenAI Agents SDK, LangGraph, and CrewAI compared across 30 criteria — architecture, tool model, memory, observability, and agency use-case fit.
Evaluation criteria
Frameworks scored
Matrix release window
Validated selection
Key Takeaways
Three frameworks, three philosophies. Picking the wrong one is a four-month lock-in. This is the 30-criterion decision matrix we use for client agent builds — the same matrix that has steered every AI transformation engagement we have shipped since agent frameworks became genuinely viable for production work.
OpenAI Agents SDK, LangGraph, and CrewAI are the three most-adopted frameworks for building production agents in 2026, and the Vercel AI SDK with agent primitives is the most honest dark horse in the space. Each framework answers the question "what is an agent?" differently, and those answers cascade into every design decision downstream — tool model, memory, streaming, observability, deployment target, and eventual migration cost.
How to read this matrix: Scores are qualitative, not vendor benchmarks. We optimize for agency engagements where a team of two to five engineers needs to ship a production agent in 6-12 weeks, hand it to an ops team, and not regret the framework choice two years later. If your constraints differ, re-weight accordingly.
Framework philosophies: imperative vs graph vs crew
Before feature-by-feature comparison, internalize the three mental models. They explain every downstream design choice — and they are why ports between frameworks rarely feel like refactors.
Agents call tools, occasionally hand off to other agents, and the runtime loops until a stopping condition fires. Control flow lives inside the agent, not above it.
Nodes transform state, edges route between them, checkpoints persist progress. Control flow is the graph, and the graph is the spec.
Agents are roles with goals, backstories, and tool sets. Tasks are declarative units of work crews execute in sequential or hierarchical process modes.
The philosophy determines what is easy. OpenAI Agents SDK makes single-agent tool loops trivial and multi-step branching awkward. LangGraph makes branching and persistence trivial and simple tool-calling verbose. CrewAI makes business-process-shaped workflows readable and tight inner loops heavy. None of these are flaws — they are consequences of the core abstraction.
Agency take: Most client projects we have rescued failed because the team picked the framework that looked friendliest on the landing page, not the one whose core abstraction matched the problem. Pick the philosophy first, framework second. Explore our AI digital transformation service to review framework fit before commitment.
OpenAI Agents SDK
OpenAI Agents SDK is OpenAI's first-party framework for building agents on top of the Responses API. The core primitives are small: an Agent with instructions and tools, a Runner that drives the loop, and handoffs for transferring control between agents. The design goal is explicitly "the smallest thing that works" — and it mostly achieves it.
Architecture
The runtime is an agent loop: the model generates a response, the SDK executes any tool calls, appends results to the message list, and iterates until the agent produces a final output or a stopping condition fires. Handoffs route the conversation to a different agent while carrying context. There is no graph, no explicit state machine — state is the message list plus whatever you attach.
Tool model
Tools are functions with typed schemas, defined inline with decorators or adapters. The SDK integrates first-class with OpenAI's hosted tools — web search, file search, code interpreter — and you can mix those with custom functions in the same agent. The tool-call ergonomics are as clean as anything in the space.
Strengths
- Shortest path from zero to working agent — fewer than 50 lines for a useful tool-using agent
- First-party tracing via the OpenAI platform, no extra integration
- Tight alignment with Responses API features — structured output, hosted tools, and reasoning controls
- Handoff primitive maps well to multi-specialist workflows without graph overhead
- First-class Python and TypeScript implementations
Weaknesses
- No built-in persistence layer — long-running workflows need a custom queue and store
- Branching and conditional routing feel unnatural compared to graph frameworks
- Cross-provider story exists but is weaker than framework-first options
- Human-in-the-loop approvals require manual plumbing around the runner
LangGraph
LangGraph is LangChain's graph framework for building stateful, multi-step agent workflows. The core metaphor is a directed graph where nodes transform a shared state and edges route between them. Checkpointers persist state between node executions, which makes durable workflows a default rather than a project.
Graph model
You define a typed state schema, a set of node functions that receive and return partial state, and edges that connect them. Conditional edges route based on the current state. The graph is compiled into an executable, and any node invocation can be checkpointed — which makes resume, rollback, and time-travel debugging straightforward.
State machines and persistence
State is typed and explicit — you know what every node reads and writes. Checkpointers support in-memory, SQLite, and Postgres out of the box, with third-party backends for Redis and other stores. Threads group related checkpoints so conversational state and agent memory coexist cleanly. For long-running work — hours, days, or waiting on a human — LangGraph is the most ergonomic option in the category.
Strengths
- Explicit state and control flow — no hidden loop, every transition is visible
- Built-in checkpointing, resume, and time-travel debugging via LangSmith
- Human-in-the-loop approvals are a first-class primitive, not a workaround
- Parallel nodes, subgraphs, and streaming state updates make complex flows tractable
- Python and TypeScript implementations with broad provider support
Weaknesses
- Learning curve is meaningfully higher than imperative frameworks
- Simple tool-calling agents feel heavy — graph ceremony for loop shape
- Best observability lives in LangSmith, which some organizations cannot adopt
- TypeScript feature parity trails Python by a noticeable margin
CrewAI
CrewAI frames multi-agent systems as crews — collections of role-playing agents executing declarative tasks. Each agent carries a role, goal, backstory, and tool set. Tasks define the work units and expected outputs. A Crew orchestrates them in a sequential or hierarchical process mode. For non-engineering stakeholders, the abstraction is remarkably approachable.
Role-based agents
An agent is not a function — it is a character with a job. The framework leans into this metaphor: "Senior Research Analyst", "Content Strategist", "Technical Reviewer". The prompts generated from these role descriptions carry more context than naive system prompts, which meaningfully improves output quality on process-shaped work such as research, analysis, and content.
Declarative flows
Tasks are declarative — you describe outcomes, expected outputs, and dependencies. CrewAI Flows, the newer graph-style layer, add explicit control flow for when the crew metaphor is not enough. The dual-mode design lets simple cases stay simple while giving complex cases an escape hatch.
Strengths
- Role metaphor maps to business processes non-engineers can review and refine
- Declarative tasks reduce boilerplate for research and analysis workflows
- Active community and broad tool adapter coverage
- Flows layer adds graph-style control when the crew model is not enough
- Strong fit for content, research, and analyst-style workflows
Weaknesses
- Abstraction obscures control flow when agents need to replan mid-task
- Python-first — TypeScript support is noticeably weaker than alternatives
- Persistence and human-in-the-loop require more manual assembly than LangGraph
- Observability out of the box is thinner than LangSmith or the OpenAI platform
Dark horse: Vercel AI SDK with agent primitives
The Vercel AI SDK is not marketed as an agent framework. It is a TypeScript-first toolkit for streaming, tool calling, and structured generation across providers. But the combination of generateText and streamText with stopWhen, prepareStep, and typed tools covers a surprising share of real agent workloads — without the conceptual overhead of a dedicated framework.
When it beats the dedicated frameworks
For TypeScript teams building product-embedded agents — chat interfaces, copilots, tool-using assistants — the AI SDK often wins on ergonomics and maintenance load. The provider-agnostic layer makes cross-provider routing trivial, the streaming primitives are first-class, and the tool model produces typed arguments without decorator metadata. Add a small state helper and you have covered most of what teams adopt LangGraph to solve.
Where it stops
It is not a graph framework. Long-running workflows, durable persistence, time-travel debugging, and deep multi-agent coordination are out of scope. If you need any of those as a first principle, reach for LangGraph. If you do not, the AI SDK will often ship faster and age better.
Master comparison matrix
Thirty criteria across the four frameworks. Scores are qualitative: Strong (S), Good (G), Partial (P), Weak (W). Use this as the first pass, then run the deep-dive sections for the criteria that matter most to your project.
| Criterion | OpenAI Agents SDK | LangGraph | CrewAI | Vercel AI SDK |
|---|---|---|---|---|
| 1. Core abstraction clarity | S | G | G | S |
| 2. Time to first working agent | S | P | G | S |
| 3. Tool definition ergonomics | S | G | G | S |
| 4. Graph / branching expressiveness | P | S | G | P |
| 5. Typed state model | P | S | P | G |
| 6. Persistence / checkpointing | P | S | P | W |
| 7. Human-in-the-loop primitives | P | S | P | P |
| 8. Multi-agent handoff model | S | G | S | P |
| 9. Memory / long-term recall | P | G | G | P |
| 10. Streaming support | S | S | G | S |
| 11. Observability out of the box | S | S | P | G |
| 12. Evaluation tooling | G | S | P | P |
| 13. Cost attribution | S | G | P | G |
| 14. Cross-provider routing | P | G | G | S |
| 15. MCP / external tool registry | S | G | G | G |
| 16. RAG integration story | G | S | G | G |
| 17. Structured output | S | G | G | S |
| 18. TypeScript quality | S | G | P | S |
| 19. Python maturity | S | S | S | W |
| 20. Deployment target flexibility | G | S | G | S |
| 21. Serverless friendliness | G | G | P | S |
| 22. Long-running workflow support | P | S | P | W |
| 23. Parallelism / concurrency | G | S | G | G |
| 24. Community size | S | S | S | S |
| 25. Vendor independence | P | G | S | S |
| 26. Testing support | G | S | P | G |
| 27. Learning curve | S | P | G | S |
| 28. Safety / guardrails hooks | G | G | P | G |
| 29. Release cadence | S | S | S | S |
| 30. Agency project fit | G | S | G | G |
Matrix date: Q2 2026. Agent frameworks move quickly — verify current feature parity before final selection, especially on checkpointing, MCP support, and TypeScript feature gaps.
Deep dive: architecture, tool model, memory
Architecture
The architectural split between these frameworks is the most consequential choice. OpenAI Agents SDK runs an implicit loop with handoffs; LangGraph compiles a typed graph; CrewAI coordinates a crew through a process mode; Vercel AI SDK exposes step-wise generation you compose yourself. If your workflow has fewer than three decision points, imperative frameworks win on clarity. Past three, graph frameworks win on maintainability.
Tool model
Tool ergonomics matter more than teams expect. OpenAI Agents SDK and Vercel AI SDK both produce typed tools with minimal ceremony. LangGraph uses LangChain tool abstractions, which carry legacy ergonomics from the pre-typed era. CrewAI tools are Python-first and composable but the type story is thinner. All four support MCP as an external tool registry — see our MCP ecosystem guide for the integration surface.
Memory
Memory splits into short-term (within a run) and long-term (across runs). LangGraph's thread model handles both cleanly via checkpointers. OpenAI Agents SDK expects you to attach memory at the session level. CrewAI has short-term, long-term, and entity memory primitives but they are less configurable than LangGraph's. Vercel AI SDK leaves memory to the application layer, which is a feature for teams with an existing datastore and a drag for teams starting cold.
| Dimension | OpenAI SDK | LangGraph | CrewAI | Vercel AI SDK |
|---|---|---|---|---|
| Core abstraction | Agent loop + handoffs | Typed graph nodes | Role-based crew + tasks | Step-wise generate / stream |
| Tool ergonomics | Typed, decorator-style | LangChain tools | Composable classes | Zod-typed tools |
| Short-term memory | Message list | Typed state + checkpointer | Crew context | App-owned |
| Long-term memory | Session adapter | Thread + store | Entity + long-term memory | App-owned |
| MCP client support | First-class | Via adapter | Via adapter | First-class |
For patterns that layer on top of whichever architecture you pick, see our multi-agent orchestration patterns guide and the enterprise agent platform reference architecture.
Deep dive: observability, evaluation, deployment
Observability
Observability is the hidden tiebreaker. OpenAI Agents SDK ships first-party tracing in the OpenAI platform with almost no setup. LangGraph integrates deeply with LangSmith for traces, evals, and dataset management. CrewAI supports multiple backends but the out-of-the-box experience lags. Vercel AI SDK exposes hooks that plug into OpenTelemetry, Langfuse, and Arize cleanly. For deeper patterns, see our agent observability guide.
Evaluation
Evals separate the frameworks that ship to production from the ones that ship to demos. LangSmith's dataset-driven evaluation is the strongest offering in the set. OpenAI's platform provides solid eval surfaces. CrewAI evaluation is thinner — teams typically bolt on Langfuse or build custom. Vercel AI SDK has no native eval suite; you wire your own harness around the streaming output.
Deployment
Deployment targets matter for cost and latency. OpenAI Agents SDK and Vercel AI SDK deploy cleanly to any serverless platform. LangGraph Cloud exists for managed deployment; self-hosting works but requires queueing infrastructure for long-running graphs. CrewAI runs well on containers and traditional VMs but the serverless story is weaker because crews often hold state across long task durations.
Agency take: For client engagements where you will hand off ops, weight observability and eval tooling close to the top of the matrix. Our CRM automation team has seen more rollouts fail on trace gaps than on model quality.
Deep dive: TypeScript, streaming, persistence
TypeScript quality
Vercel AI SDK and the OpenAI Agents SDK are the two strongest TypeScript stories. Types flow through tool arguments, return values, and streaming chunks without casts. LangGraph's TypeScript implementation has closed most of the gap with Python but still trails on the newest primitives. CrewAI is Python-first and the TypeScript port is thinner — for TypeScript-first teams, that alone can be disqualifying.
Streaming
All four frameworks support streaming text tokens. The differences show up in streaming intermediate state. LangGraph can stream state diffs at every node boundary — essential for agent UIs that show tool calls, plans, and mid-task status. Vercel AI SDK streams steps, tool calls, and UI messages with typed chunks. OpenAI Agents SDK streams response events and tool events. CrewAI's streaming story is thinner and generally lags when agents produce long research outputs.
Persistence
LangGraph is the clear winner on persistence. Checkpointers, threads, and stores make pause-resume, time-travel, and human-in-the-loop approvals first-class rather than custom infrastructure. Everyone else treats persistence as an app concern. For long-running agents — reviews pending hours, research spanning days, approvals waiting on humans — this is the feature that pays back the graph learning curve.
LangGraph ships in-memory, SQLite, and Postgres checkpointers with third-party Redis. Others expect you to bring your own store — which is fine if you have one, expensive if you do not.
Typed state diffs, tool-call previews, and plan updates make agent UIs feel responsive. Plain token streams feel like chatbots. Weight this by how much of your UX is agent-facing.
Use-case fit: which framework wins when?
The matrix is neutral; real projects are not. Here is how the four frameworks line up against the use cases we see most often on client engagements.
| Use case | Best fit | Why |
|---|---|---|
| Customer support copilot (single agent) | OpenAI Agents SDK | Fast to ship, tracing out of the box, handoffs cover escalation |
| Long-running deal-desk approval agent | LangGraph | Checkpointers, HITL, durable state across days |
| Research + analysis report generation | CrewAI | Role-based crew maps naturally to researcher / analyst / editor |
| Product-embedded TypeScript copilot | Vercel AI SDK | Streaming + tools + typed UI messages with minimal framework weight |
| Multi-step workflow with branching | LangGraph | Graph expressiveness is worth the learning curve past three decision points |
| Internal operations bot with handoffs | OpenAI Agents SDK | Handoff model covers specialist routing without graph ceremony |
| Content production pipeline | CrewAI | Declarative tasks and role-based review fit content workflows cleanly |
| Cross-provider routing and fallback | Vercel AI SDK | Provider abstraction is cleaner than any framework-first option |
For Anthropic-native implementations, our Claude Agent SDK production patterns guide covers the same terrain from the Claude side, and the MCP vs LangChain vs CrewAI comparison zooms out to the broader ecosystem.
Migration paths
If you already have a production agent and are considering a switch, here is how the common migration paths shake out. All are hard. None are impossible. The reality check is that porting usually takes a focused engineer four to twelve weeks depending on tool surface, memory complexity, and evaluation harness.
OpenAI Agents SDK → LangGraph
Usually triggered by a workflow outgrowing the handoff model — branching, approvals, or multi-day durability. The agent loop maps to a ReAct-style graph node; handoffs become conditional edges. The biggest surprise is re-modeling memory from a session object into typed state. Plan for meaningful prompt re-tuning; graph execution exposes intermediate states the flat loop hid.
CrewAI → LangGraph
Usually triggered by needing control flow the crew model cannot express — mid-task replanning, dynamic agent spawning, conditional tool execution. Tasks map to nodes, process modes map to graph topology. The role metaphor is lost; what replaces it is explicit state and explicit transitions. Teams report the translation itself is mechanical; the hard part is accepting that some of the crew expressiveness does not survive the port.
LangGraph → Vercel AI SDK
Rare but real. Triggered by realizing the graph ceremony was overkill — the agent really does need to call four tools in a loop and stream back. Each node collapses to a step in a streamText-style loop with stopWhen conditions. You lose durable checkpointing and time-travel debugging. You gain dramatic simplification and better streaming primitives. Only appropriate when the graph was genuinely overbuilt.
Any → MCP-based tool layer
The lowest-friction migration lever is standardizing tools behind an MCP server rather than inline definitions. Once tools are MCP, the framework becomes genuinely replaceable — each of the four frameworks supports MCP clients, and the cost of a full re-framework drops by roughly the tool-surface share of the codebase. Worth investing in early even if you never switch.
The bottom line
There is no universally correct choice. OpenAI Agents SDK wins on time-to-production for imperative loops. LangGraph wins on control, durability, and long-horizon work. CrewAI wins when the business process genuinely maps to a crew. Vercel AI SDK wins for TypeScript product teams who do not need graph semantics. Score your project against the axes that matter for your actual workload, pick the winner, then invest in the abstractions — tools behind MCP, memory behind a thin interface, evals in a vendor-neutral platform — that keep the framework replaceable.
The wrong framework is not an existential risk. It is a four-month tax. The right matrix keeps that tax from landing on the same client engagement twice.
Pick the right agent framework the first time
Framework selection is a four-month decision. We run the 30-criterion matrix on your actual workload, validate the top two in a one-week spike, and hand you a production-ready recommendation before a single line of agent code is written.
Frequently Asked Questions
Related Guides
Continue exploring agent frameworks, orchestration patterns, and production architecture.