SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI DevelopmentDecision Matrix3 min readPublished Apr 28, 2026

8 databases · 4 reference workloads · latency, cost, hybrid search, metadata filtering, and managed-service data

Vector Databases for AI Agents: 8 DBs Compared.

Eight vector databases anchor 2026 AI-agent workloads: Pinecone (managed leader), Qdrant (Rust-based open-source speed leader), Weaviate (hybrid + GraphQL), Milvus (large-scale), Chroma (DX leader), pgvector (Postgres-integrated default), Vertex Vector (GCP), and Vespa (large-scale hybrid). Pick by managed-vs-self-host, scale, and hybrid-search needs.

DA
Digital Applied Team
Senior strategists · Published Apr 28, 2026
PublishedApr 28, 2026
Read time3 min
SourcesANN-Benchmarks · vendor docs · pgvector + Supabase tests · field deployments
Qdrant p99 latency
~12 ms
@10M vectors · Rust speed
OSS leader
Pinecone managed
$70+/mo
starter pod · enterprise scale
managed leader
pgvector default
$0
Postgres extension · runs anywhere
Vespa multi-modal
scale
billions of vectors + text

Vector databases moved from research curiosity to production necessity in 2023-2024. By 2026 the field has consolidated to eight production-grade options that dominate real AI-agent workloads. The decision dimensions are managed vs self-host, scale tier, hybrid-search depth, and the team's existing data-platform commitments — not headline benchmarks.

We compare eight databases across query latency, scale ceiling, hybrid search, metadata filtering, managed-service availability, and pricing model. Most teams pick by data-platform commitment (pgvector if Postgres-anchored, Pinecone if managed-cloud preference, Vertex if GCP-native) rather than aggregate benchmarks.

This post covers the 7-axis matrix, deep dives by category (managed leaders, open-source primaries, embedded + Postgres, large-scale hybrid), and four reference workloads we run for engineering teams today.

Key takeaways
  1. 01
    Pick by data-platform commitment first; benchmarks are tie-breakers.If Postgres is the data platform, pgvector is the default — running a separate vector DB only justifies itself when scale or workload demands it. If managed-cloud is the preference, Pinecone is the default. If GCP, Vertex Vector. The team's existing platform commitments dominate the decision; ANN benchmarks tie-break between adequate options.
  2. 02
    Qdrant leads open-source speed — 10-25% faster than Weaviate or Milvus on common workloads.Qdrant's Rust implementation gives it the latency edge among open-source vector DBs. p99 latency at 10M vectors typically lands ~12ms vs Weaviate's ~16ms and Milvus's ~18ms. The gap matters at high QPS; less material at low query volumes. Right open-source pick when speed dominates.
  3. 03
    pgvector is the right default for ~70% of AI-agent workloads.If the workload is under 10M vectors, the team already runs Postgres, and queries don't need ultra-low latency, pgvector is the right default. Same backups, same operational tools, same access controls as the rest of the application data. Add a dedicated vector DB only when scale, hybrid search, or specialized features demand it.
  4. 04
    Hybrid search (vector + keyword) is the deciding feature for many production deployments.Pure vector search underperforms hybrid (vector + BM25 + metadata filters) on most production workloads — agents need exact-match for proper nouns, version numbers, IDs while still getting semantic matching. Weaviate, Vespa, and Qdrant ship hybrid-search natively. Pinecone added it; pgvector requires manual composition. For agent-memory and RAG over diverse content, hybrid search is non-optional.
  5. 05
    Scale tier matters: under 10M, anything works; 10M-1B, narrow choices; 1B+, Vespa or Milvus.Under 10M vectors, all eight databases perform adequately. Between 10M-1B, the field narrows to Pinecone (managed), Qdrant + Weaviate + Milvus (self-host), and Vespa. Above 1B vectors, Vespa and Milvus distributed deployments are the production-grade options. Pinecone scales but cost compounds. Match scale tier to platform; don't over-invest if you'll never cross 10M.

01The FieldThe 2026 vector-DB field.

The vector-database field consolidated rapidly. Eight databases now own the production conversation, split across four tiers: managed leaders (Pinecone, Vertex Vector), open-source primaries (Qdrant, Weaviate, Milvus), embedded + Postgres-integrated (Chroma, pgvector), and large-scale hybrid (Vespa). Each tier serves a different deployment shape; teams default into the tier that matches their existing data-platform commitments.

Tier 1
Pinecone — managed leader
Managed-cloud · pods + serverless · enterprise scale

The managed-cloud default. Predictable performance, generous index sizes, hybrid search added in 2024-2025. Right pick when managed-cloud is the preference and the team values not running infrastructure.

Managed
Tier 2
Qdrant — open-source speed leader
Rust-based · self-host or managed cloud

The Rust implementation gives Qdrant the latency edge among open-source vector DBs. Strong filtering, hybrid search, and quantization. Right OSS pick when speed dominates.

OSS speed
Tier 2
Weaviate — hybrid + GraphQL
Open-source · GraphQL API · hybrid leader

Weaviate's hybrid-search story is among the field's strongest — vector + BM25 + metadata-filtering composition is native. GraphQL API differentiates from REST-first peers. Right pick for hybrid-search-heavy workloads.

Hybrid leader
Tier 2
Milvus — large-scale leader
Open-source · distributed · billion-scale capable

Milvus distributed scales to billions of vectors. The production large-scale OSS choice. Operational complexity is real — pays back at scales where Pinecone cost compounds.

Large-scale OSS
Tier 3
Chroma — DX leader
Embedded + cloud · Python-first · prototype-friendly

Cleanest DX for prototyping. Embedded mode runs in-process; cloud mode for production. Right pick when getting started fast matters more than production scale.

DX-first
Tier 3
pgvector — Postgres default
Postgres extension · runs anywhere · $0 add-on

If Postgres is the data platform, pgvector is the default. Same backups, same ops, same access. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated DB only when needed.

Postgres default
Tier 4
Vertex Vector Search — GCP-native
Managed-GCP · BigQuery integration · enterprise

Google Cloud's managed vector search. Right pick when the team is GCP-native and BigQuery integration matters. Pricing scales with index size + query volume.

GCP-native
Tier 4
Vespa — large-scale hybrid
Open-source · billions of vectors · text + vector

Yahoo's open-source search engine. The production-grade pick for billion-scale hybrid search (vector + structured + text). Operational complexity matches the scale; pays back when scale demands it.

Massive scale

02MatrixFeature matrix, eight databases.

The matrix below covers seven capabilities that drive 2026 vector-DB decisions: query latency at 10M vectors, scale ceiling, hybrid-search support, metadata filtering, managed-service availability, pricing model, and best-fit deployment pattern.

Capability
Query latency at 10M vectors (p99)

Qdrant ~12ms wins among OSS. Pinecone ~10-15ms managed. Weaviate ~16ms. Milvus ~18ms. pgvector ~25-40ms (depends on index type). Vertex ~12ms managed. Vespa ~15ms. Chroma ~30ms (not optimized for ultra-low latency). Picks differ at sub-10ms requirements.

Qdrant · Pinecone
Capability
Scale ceiling (production-grade)

Vespa + Milvus distributed scale to billions cleanly. Pinecone scales high but cost compounds. Qdrant, Weaviate distributed are competitive. pgvector hits operational friction above ~10-50M depending on hardware. Chroma cloud is improving; embedded Chroma caps lower.

Vespa · Milvus (1B+) · Pinecone
Capability
Hybrid search (vector + BM25 + filter)

Weaviate, Vespa lead with native hybrid composition. Qdrant added strong hybrid in 2024. Pinecone added hybrid; competitive. Milvus has hybrid via collections + filtering. pgvector requires manual composition with full-text search. Chroma simpler hybrid story.

Weaviate · Vespa · Qdrant
Capability
Metadata filtering depth

Qdrant has the strongest filter expressiveness (complex filter syntax, payload indexes). Weaviate strong via GraphQL. Pinecone solid. pgvector inherits Postgres's full SQL filtering — most expressive overall when SQL fits the workload. Milvus competitive.

pgvector (SQL) · Qdrant (filter syntax)
Capability
Managed-service availability

Pinecone is managed-only. Vertex Vector is managed-GCP-only. Qdrant Cloud, Weaviate Cloud, Milvus Cloud (Zilliz) all available alongside self-host. Chroma cloud is generally available. pgvector via managed Postgres (Supabase, Neon, RDS, etc.). Vespa managed via Vespa Cloud.

Pinecone (managed-only)
Capability
Pricing model

pgvector $0 (Postgres infra cost only). Chroma cloud generous free tier. Qdrant Cloud + Weaviate Cloud usage-based. Milvus / Zilliz cloud usage-based. Pinecone $70+/mo starter; serverless usage-based at scale. Vertex pay-per-query + index size. Vespa usage-based (cloud) or self-host.

pgvector (cheapest at scale)
Capability
Best-fit deployment pattern

pgvector: Postgres-anchored teams under 10M vectors. Pinecone: managed-cloud preference, any scale. Qdrant: speed-sensitive OSS deployments. Weaviate: hybrid-search-heavy. Milvus: large-scale OSS. Chroma: prototypes + small-prod. Vertex: GCP-native. Vespa: billion-scale hybrid.

Match deployment pattern

03Managed LeadersManaged leaders — Pinecone and Vertex Vector.

Pinecone and Vertex Vector Search are the managed-cloud leaders. Pinecone is the cross-cloud managed default; Vertex is the GCP-native option for teams committed to Google Cloud. Both remove infrastructure ops; both pay back when the team values not running its own vector DB.

Pinecone
Managed
Cross-cloud production default

The cross-cloud managed default. Pods + serverless tiers, generous index sizes, hybrid search, predictable performance. Right pick when managed-cloud preference dominates and AWS/Azure/GCP-agnostic deployment matters.

Cross-cloud
Vertex
GCP
BigQuery + Vertex AI native

Google Cloud's managed vector search. BigQuery integration, Vertex AI ecosystem fit, GCP IAM. Right pick when team is GCP-native and Vertex AI is the broader ML/AI stack. Pricing scales with index + query volume.

GCP-native
Trade-off
Cost
Cost at scale

Both managed services have meaningful cost at billion-scale workloads vs self-hosted alternatives (Milvus, Vespa). The cost is a service trade-off — pay more for managed simplicity. At 10M-100M vectors, the cost is competitive; above 1B, evaluate self-host.

Scale-cost trade
"Pinecone is what most teams should default to. pgvector is what most teams should actually use, because most workloads are smaller than people think."— Internal vector-DB stack retro, March 2026

04Open-SourceOpen-source — Qdrant, Weaviate, Milvus.

Three open-source vector DBs anchor the production OSS conversation. Qdrant wins on speed (Rust implementation), Weaviate wins on hybrid search and GraphQL API ergonomics, Milvus wins on large-scale distributed deployments. All three have managed-cloud equivalents (Qdrant Cloud, Weaviate Cloud, Zilliz) for teams that want OSS code semantics with managed operations.

Qdrant
Rust-based · speed leader

Latency edge among OSS vector DBs. Strong filter syntax, hybrid search added in 2024, quantization for memory efficiency. Right OSS pick when speed and filter expressiveness dominate. Self-host or Qdrant Cloud.

Speed + filtering
Weaviate
Hybrid + GraphQL

Native hybrid (vector + BM25 + filter) composition. GraphQL API differentiates from REST-first peers. Right pick when hybrid search is the primary workload and GraphQL fits the team's API style.

Hybrid + GraphQL
Milvus
Large-scale distributed

Distributed deployments scale to billions of vectors. Production large-scale OSS choice. Operational complexity matches the scale; pays back where Pinecone cost compounds. Zilliz cloud for managed equivalent.

Large-scale OSS

05Embedded + PostgresEmbedded + Postgres — Chroma and pgvector.

Chroma and pgvector serve adjacent niches the dedicated vector DBs don't. Chroma wins on developer experience for prototyping (embedded mode runs in-process). pgvector wins on operational simplicity for Postgres-anchored teams (same data platform, same backups, same ops). Both are appropriate for ~70% of AI-agent workloads we see in the wild.

Chroma
DX
Cleanest developer experience

Embedded mode (in-process Python) for prototypes; cloud mode for production. Cleanest 'getting started' path among vector DBs. Right pick when prototype velocity dominates; less ideal for ultra-low latency or billion-scale workloads.

Prototype-first
pgvector
$0
Postgres-integrated default

If Postgres is the data platform, pgvector is the default vector store. Same backups, same operational tools, same access controls. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated vector DB only when scale or workload demands it.

Postgres default
Trade-off
Scale
Both cap below dedicated DBs

Chroma's embedded mode caps at small-prod scale; cloud mode scales but doesn't match dedicated DBs. pgvector hits operational friction above 10-50M vectors depending on hardware. Both are right defaults for under-10M; evaluate alternatives above that threshold.

Scale ceiling

06VespaVespa — the billion-scale hybrid leader.

Vespa is the production-grade pick for billion-scale hybrid search — vector + structured + text in one engine. Yahoo's open-source search engine has the deepest hybrid-search depth in the field at scale. Operational complexity matches the scale; pays back when scale demands it.

Strength
1B+
Billion-scale production deployment

Vespa runs production search at Yahoo, Spotify, and similar scale-defining deployments. The scale ceiling is among the field's highest. Right pick when the workload is genuinely massive — vector counts in the billions or query volumes that overwhelm alternatives.

Massive scale
Strength
Hybrid
Vector + text + structured native

Vespa was a search engine before vector search was a category. Hybrid composition (vector + BM25 + structured filtering) is native and deep. Right pick for any workload where hybrid search at scale matters most.

Hybrid depth
Trade-off
Ops
Operational complexity

Vespa's operational complexity is real — schema configuration, content cluster + container topology, deployment workflows. Pays back at scale; doesn't pay back for sub-10M-vector workloads where Pinecone or pgvector serve better.

Ops-heavy

07Reference WorkloadsFour reference workloads.

Below are the four AI-agent workloads we deploy most often, with the database recommendation that consistently wins on each. The mapping isn't absolute, but each pairing is the path of least friction.

Workload 1
Small RAG (under 10M vectors, Postgres team)

Most agency-grade RAG workloads. pgvector is the default — under 10M vectors, Postgres-anchored, same backups and ops as the rest of the application data. Don't reach for a dedicated DB unless scale or workload demands it.

pgvector
Workload 2
Large RAG (10M-1B vectors, hybrid search)

Production RAG at scale with hybrid-search needs. Weaviate (open-source, hybrid native) or Pinecone (managed) are the right defaults. Qdrant if speed dominates and self-host fits. Match by managed-vs-OSS preference.

Weaviate · Pinecone · Qdrant
Workload 3
Hybrid search at scale (1B+ vectors + text)

Massive-scale workloads where hybrid search and operational scale dominate. Vespa is the production-grade choice. Milvus distributed is the alternative. Pinecone scales but cost compounds.

Vespa · Milvus
Workload 4
Agent-memory store (long-running, multi-tenant)

Agent-memory store needs metadata-rich filtering, multi-tenant isolation, and durable persistence. pgvector's SQL filtering shines here when scale fits. Qdrant strong if speed + filter syntax matter more. Pinecone for managed simplicity.

pgvector · Qdrant · Pinecone

08ConclusionPick by data-platform commitment first.

Vector databases for AI, April 2026

There is no single best vector database. There are right defaults per data-platform commitment and scale tier.

By April 2026 the vector-database field has consolidated to eight production-grade options across four tiers. The decision dimensions that actually matter — managed vs self-host, scale tier, hybrid-search needs, existing data-platform commitments — outweigh aggregate ANN benchmarks for most teams. There is no "best" vector DB in the abstract; there is the right default for the deployment pattern.

The pattern that scales: pick by data-platform commitment first. Postgres team under 10M vectors → pgvector. Managed-cloud preference, any scale → Pinecone. GCP-native team → Vertex Vector. Hybrid-search-heavy → Weaviate. Speed-dominant OSS → Qdrant. Billion-scale hybrid → Vespa or Milvus. The benchmarks tie-break between adequate options once the platform commitment narrows the field.

The right move for most engineering teams: default to pgvector until scale or workload demands more. Most AI-agent RAG workloads are smaller than they feel; running a separate vector DB adds operational toil that often doesn't pay back. Reach for dedicated vector DBs when the workload genuinely needs what they offer — not before.

Production vector DB stacks

Move past benchmark debates. Pick by data-platform commitment.

We design and operate vector-database stacks for engineering teams across pgvector, Pinecone, Qdrant, Weaviate, Milvus, Chroma, Vertex Vector, and Vespa — covering DB selection, hybrid-search architecture, agent-memory schema design, and scale-out planning.

Free consultationExpert guidanceTailored solutions
What we work on

Vector-DB engagements

  • Database selection by data-platform commitment
  • pgvector schema design + indexing strategy
  • Pinecone or Qdrant production rollouts
  • Hybrid-search architecture (vector + BM25)
  • Agent-memory store design + scale planning
FAQ · Vector databases 2026

The questions we get every week.

Default to pgvector when (a) the team already runs Postgres, (b) the vector workload is under ~10M vectors, (c) queries don't need ultra-low p99 latency (sub-15ms), (d) the workload benefits from joining vector data with relational data via SQL. Most agency-grade RAG and agent-memory workloads fit these criteria. Reach for a dedicated vector DB when (a) scale exceeds ~10-50M vectors, (b) p99 latency requirements are sub-10ms, (c) hybrid search depth (vector + BM25) is central, (d) the team needs features pgvector doesn't ship (multi-tenant isolation, sharding semantics, specific quantization options). Most teams overestimate scale needs and end up running dedicated DBs that pgvector would have served. Default to pgvector until proven otherwise.