Vector databases moved from research curiosity to production necessity in 2023-2024. By 2026 the field has consolidated to eight production-grade options that dominate real AI-agent workloads. The decision dimensions are managed vs self-host, scale tier, hybrid-search depth, and the team's existing data-platform commitments — not headline benchmarks.

We compare eight databases across query latency, scale ceiling, hybrid search, metadata filtering, managed-service availability, and pricing model. Most teams pick by data-platform commitment (pgvector if Postgres-anchored, Pinecone if managed-cloud preference, Vertex if GCP-native) rather than aggregate benchmarks.

This post covers the 7-axis matrix, deep dives by category (managed leaders, open-source primaries, embedded + Postgres, large-scale hybrid), and four reference workloads we run for engineering teams today.

Key takeaways

01
Pick by data-platform commitment first; benchmarks are tie-breakers.If Postgres is the data platform, pgvector is the default — running a separate vector DB only justifies itself when scale or workload demands it. If managed-cloud is the preference, Pinecone is the default. If GCP, Vertex Vector. The team's existing platform commitments dominate the decision; ANN benchmarks tie-break between adequate options.
02
Qdrant leads open-source speed — 10-25% faster than Weaviate or Milvus on common workloads.Qdrant's Rust implementation gives it the latency edge among open-source vector DBs. p99 latency at 10M vectors typically lands ~12ms vs Weaviate's ~16ms and Milvus's ~18ms. The gap matters at high QPS; less material at low query volumes. Right open-source pick when speed dominates.
03
pgvector is the right default for ~70% of AI-agent workloads.If the workload is under 10M vectors, the team already runs Postgres, and queries don't need ultra-low latency, pgvector is the right default. Same backups, same operational tools, same access controls as the rest of the application data. Add a dedicated vector DB only when scale, hybrid search, or specialized features demand it.
04
Hybrid search (vector + keyword) is the deciding feature for many production deployments.Pure vector search underperforms hybrid (vector + BM25 + metadata filters) on most production workloads — agents need exact-match for proper nouns, version numbers, IDs while still getting semantic matching. Weaviate, Vespa, and Qdrant ship hybrid-search natively. Pinecone added it; pgvector requires manual composition. For agent-memory and RAG over diverse content, hybrid search is non-optional.
05
Scale tier matters: under 10M, anything works; 10M-1B, narrow choices; 1B+, Vespa or Milvus.Under 10M vectors, all eight databases perform adequately. Between 10M-1B, the field narrows to Pinecone (managed), Qdrant + Weaviate + Milvus (self-host), and Vespa. Above 1B vectors, Vespa and Milvus distributed deployments are the production-grade options. Pinecone scales but cost compounds. Match scale tier to platform; don't over-invest if you'll never cross 10M.

01 — The FieldThe 2026 vector-DB field.

The vector-database field consolidated rapidly. Eight databases now own the production conversation, split across four tiers: managed leaders (Pinecone, Vertex Vector), open-source primaries (Qdrant, Weaviate, Milvus), embedded + Postgres-integrated (Chroma, pgvector), and large-scale hybrid (Vespa). Each tier serves a different deployment shape; teams default into the tier that matches their existing data-platform commitments.

Tier 1

Pinecone — managed leader

Managed-cloud · pods + serverless · enterprise scale

The managed-cloud default. Predictable performance, generous index sizes, hybrid search added in 2024-2025. Right pick when managed-cloud is the preference and the team values not running infrastructure.

Managed

Tier 2

Qdrant — open-source speed leader

Rust-based · self-host or managed cloud

The Rust implementation gives Qdrant the latency edge among open-source vector DBs. Strong filtering, hybrid search, and quantization. Right OSS pick when speed dominates.

OSS speed

Tier 2

Weaviate — hybrid + GraphQL

Open-source · GraphQL API · hybrid leader

Weaviate's hybrid-search story is among the field's strongest — vector + BM25 + metadata-filtering composition is native. GraphQL API differentiates from REST-first peers. Right pick for hybrid-search-heavy workloads.

Hybrid leader

Tier 2

Milvus — large-scale leader

Open-source · distributed · billion-scale capable

Milvus distributed scales to billions of vectors. The production large-scale OSS choice. Operational complexity is real — pays back at scales where Pinecone cost compounds.

Large-scale OSS

Tier 3

Chroma — DX leader

Embedded + cloud · Python-first · prototype-friendly

Cleanest DX for prototyping. Embedded mode runs in-process; cloud mode for production. Right pick when getting started fast matters more than production scale.

DX-first

Tier 3

pgvector — Postgres default

Postgres extension · runs anywhere · $0 add-on

If Postgres is the data platform, pgvector is the default. Same backups, same ops, same access. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated DB only when needed.

Postgres default

Tier 4

Vertex Vector Search — GCP-native

Managed-GCP · BigQuery integration · enterprise

Google Cloud's managed vector search. Right pick when the team is GCP-native and BigQuery integration matters. Pricing scales with index size + query volume.

GCP-native

Tier 4

Vespa — large-scale hybrid

Open-source · billions of vectors · text + vector

Yahoo's open-source search engine. The production-grade pick for billion-scale hybrid search (vector + structured + text). Operational complexity matches the scale; pays back when scale demands it.

Massive scale

02 — MatrixFeature matrix, eight databases.

The matrix below covers seven capabilities that drive 2026 vector-DB decisions: query latency at 10M vectors, scale ceiling, hybrid-search support, metadata filtering, managed-service availability, pricing model, and best-fit deployment pattern.

Capability

Query latency at 10M vectors (p99)

Qdrant ~12ms wins among OSS. Pinecone ~10-15ms managed. Weaviate ~16ms. Milvus ~18ms. pgvector ~25-40ms (depends on index type). Vertex ~12ms managed. Vespa ~15ms. Chroma ~30ms (not optimized for ultra-low latency). Picks differ at sub-10ms requirements.

Qdrant · Pinecone

Capability

Scale ceiling (production-grade)

Vespa + Milvus distributed scale to billions cleanly. Pinecone scales high but cost compounds. Qdrant, Weaviate distributed are competitive. pgvector hits operational friction above ~10-50M depending on hardware. Chroma cloud is improving; embedded Chroma caps lower.

Vespa · Milvus (1B+) · Pinecone

Capability

Hybrid search (vector + BM25 + filter)

Weaviate, Vespa lead with native hybrid composition. Qdrant added strong hybrid in 2024. Pinecone added hybrid; competitive. Milvus has hybrid via collections + filtering. pgvector requires manual composition with full-text search. Chroma simpler hybrid story.

Weaviate · Vespa · Qdrant

Capability

Metadata filtering depth

Qdrant has the strongest filter expressiveness (complex filter syntax, payload indexes). Weaviate strong via GraphQL. Pinecone solid. pgvector inherits Postgres's full SQL filtering — most expressive overall when SQL fits the workload. Milvus competitive.

pgvector (SQL) · Qdrant (filter syntax)

Capability

Managed-service availability

Pinecone is managed-only. Vertex Vector is managed-GCP-only. Qdrant Cloud, Weaviate Cloud, Milvus Cloud (Zilliz) all available alongside self-host. Chroma cloud is generally available. pgvector via managed Postgres (Supabase, Neon, RDS, etc.). Vespa managed via Vespa Cloud.

Pinecone (managed-only)

Capability

Pricing model

pgvector $0 (Postgres infra cost only). Chroma cloud generous free tier. Qdrant Cloud + Weaviate Cloud usage-based. Milvus / Zilliz cloud usage-based. Pinecone $70+/mo starter; serverless usage-based at scale. Vertex pay-per-query + index size. Vespa usage-based (cloud) or self-host.

pgvector (cheapest at scale)

Capability

Best-fit deployment pattern

pgvector: Postgres-anchored teams under 10M vectors. Pinecone: managed-cloud preference, any scale. Qdrant: speed-sensitive OSS deployments. Weaviate: hybrid-search-heavy. Milvus: large-scale OSS. Chroma: prototypes + small-prod. Vertex: GCP-native. Vespa: billion-scale hybrid.

Match deployment pattern

03 — Managed LeadersManaged leaders — Pinecone and Vertex Vector.

Pinecone and Vertex Vector Search are the managed-cloud leaders. Pinecone is the cross-cloud managed default; Vertex is the GCP-native option for teams committed to Google Cloud. Both remove infrastructure ops; both pay back when the team values not running its own vector DB.

Pinecone

Managed

Cross-cloud production default

The cross-cloud managed default. Pods + serverless tiers, generous index sizes, hybrid search, predictable performance. Right pick when managed-cloud preference dominates and AWS/Azure/GCP-agnostic deployment matters.

Cross-cloud

Vertex

GCP

BigQuery + Vertex AI native

Google Cloud's managed vector search. BigQuery integration, Vertex AI ecosystem fit, GCP IAM. Right pick when team is GCP-native and Vertex AI is the broader ML/AI stack. Pricing scales with index + query volume.

GCP-native

Trade-off

Cost

Cost at scale

Both managed services have meaningful cost at billion-scale workloads vs self-hosted alternatives (Milvus, Vespa). The cost is a service trade-off — pay more for managed simplicity. At 10M-100M vectors, the cost is competitive; above 1B, evaluate self-host.

Scale-cost trade

"Pinecone is what most teams should default to. pgvector is what most teams should actually use, because most workloads are smaller than people think."— Internal vector-DB stack retro, March 2026

04 — Open-SourceOpen-source — Qdrant, Weaviate, Milvus.

Three open-source vector DBs anchor the production OSS conversation. Qdrant wins on speed (Rust implementation), Weaviate wins on hybrid search and GraphQL API ergonomics, Milvus wins on large-scale distributed deployments. All three have managed-cloud equivalents (Qdrant Cloud, Weaviate Cloud, Zilliz) for teams that want OSS code semantics with managed operations.

Qdrant

Rust-based · speed leader

Latency edge among OSS vector DBs. Strong filter syntax, hybrid search added in 2024, quantization for memory efficiency. Right OSS pick when speed and filter expressiveness dominate. Self-host or Qdrant Cloud.

Speed + filtering

Weaviate

Hybrid + GraphQL

Native hybrid (vector + BM25 + filter) composition. GraphQL API differentiates from REST-first peers. Right pick when hybrid search is the primary workload and GraphQL fits the team's API style.

Hybrid + GraphQL

Milvus

Large-scale distributed

Distributed deployments scale to billions of vectors. Production large-scale OSS choice. Operational complexity matches the scale; pays back where Pinecone cost compounds. Zilliz cloud for managed equivalent.

Large-scale OSS

05 — Embedded + PostgresEmbedded + Postgres — Chroma and pgvector.

Chroma and pgvector serve adjacent niches the dedicated vector DBs don't. Chroma wins on developer experience for prototyping (embedded mode runs in-process). pgvector wins on operational simplicity for Postgres-anchored teams (same data platform, same backups, same ops). Both are appropriate for ~70% of AI-agent workloads we see in the wild.

Chroma

Cleanest developer experience

Embedded mode (in-process Python) for prototypes; cloud mode for production. Cleanest 'getting started' path among vector DBs. Right pick when prototype velocity dominates; less ideal for ultra-low latency or billion-scale workloads.

Prototype-first

pgvector

Postgres-integrated default

If Postgres is the data platform, pgvector is the default vector store. Same backups, same operational tools, same access controls. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated vector DB only when scale or workload demands it.

Postgres default

Trade-off

Scale

Both cap below dedicated DBs

Chroma's embedded mode caps at small-prod scale; cloud mode scales but doesn't match dedicated DBs. pgvector hits operational friction above 10-50M vectors depending on hardware. Both are right defaults for under-10M; evaluate alternatives above that threshold.

Scale ceiling

06 — VespaVespa — the billion-scale hybrid leader.

Vespa is the production-grade pick for billion-scale hybrid search — vector + structured + text in one engine. Yahoo's open-source search engine has the deepest hybrid-search depth in the field at scale. Operational complexity matches the scale; pays back when scale demands it.

Strength

1B+

Billion-scale production deployment

Vespa runs production search at Yahoo, Spotify, and similar scale-defining deployments. The scale ceiling is among the field's highest. Right pick when the workload is genuinely massive — vector counts in the billions or query volumes that overwhelm alternatives.

Massive scale

Strength

Hybrid

Vector + text + structured native

Vespa was a search engine before vector search was a category. Hybrid composition (vector + BM25 + structured filtering) is native and deep. Right pick for any workload where hybrid search at scale matters most.

Hybrid depth

Trade-off

Ops

Operational complexity

Vespa's operational complexity is real — schema configuration, content cluster + container topology, deployment workflows. Pays back at scale; doesn't pay back for sub-10M-vector workloads where Pinecone or pgvector serve better.

Ops-heavy

07 — Reference WorkloadsFour reference workloads.

Below are the four AI-agent workloads we deploy most often, with the database recommendation that consistently wins on each. The mapping isn't absolute, but each pairing is the path of least friction.

Workload 1

Small RAG (under 10M vectors, Postgres team)

Most agency-grade RAG workloads. pgvector is the default — under 10M vectors, Postgres-anchored, same backups and ops as the rest of the application data. Don't reach for a dedicated DB unless scale or workload demands it.

pgvector

Workload 2

Large RAG (10M-1B vectors, hybrid search)

Production RAG at scale with hybrid-search needs. Weaviate (open-source, hybrid native) or Pinecone (managed) are the right defaults. Qdrant if speed dominates and self-host fits. Match by managed-vs-OSS preference.

Weaviate · Pinecone · Qdrant

Workload 3

Hybrid search at scale (1B+ vectors + text)

Massive-scale workloads where hybrid search and operational scale dominate. Vespa is the production-grade choice. Milvus distributed is the alternative. Pinecone scales but cost compounds.

Vespa · Milvus

Workload 4

Agent-memory store (long-running, multi-tenant)

Agent-memory store needs metadata-rich filtering, multi-tenant isolation, and durable persistence. pgvector's SQL filtering shines here when scale fits. Qdrant strong if speed + filter syntax matter more. Pinecone for managed simplicity.

pgvector · Qdrant · Pinecone

08 — ConclusionPick by data-platform commitment first.

Vector databases for AI, April 2026

There is no single best vector database. There are right defaults per data-platform commitment and scale tier.

By April 2026 the vector-database field has consolidated to eight production-grade options across four tiers. The decision dimensions that actually matter — managed vs self-host, scale tier, hybrid-search needs, existing data-platform commitments — outweigh aggregate ANN benchmarks for most teams. There is no "best" vector DB in the abstract; there is the right default for the deployment pattern.

The pattern that scales: pick by data-platform commitment first. Postgres team under 10M vectors → pgvector. Managed-cloud preference, any scale → Pinecone. GCP-native team → Vertex Vector. Hybrid-search-heavy → Weaviate. Speed-dominant OSS → Qdrant. Billion-scale hybrid → Vespa or Milvus. The benchmarks tie-break between adequate options once the platform commitment narrows the field.

The right move for most engineering teams: default to pgvector until scale or workload demands more. Most AI-agent RAG workloads are smaller than they feel; running a separate vector DB adds operational toil that often doesn't pay back. Reach for dedicated vector DBs when the workload genuinely needs what they offer — not before.

Vector Databases for AI Agents: 8 DBs Compared.

01 — The FieldThe 2026 vector-DB field.

Pinecone — managed leader

Qdrant — open-source speed leader

Weaviate — hybrid + GraphQL

Milvus — large-scale leader

Chroma — DX leader

pgvector — Postgres default

Vertex Vector Search — GCP-native

Vespa — large-scale hybrid

02 — MatrixFeature matrix, eight databases.

Query latency at 10M vectors (p99)

Scale ceiling (production-grade)

Hybrid search (vector + BM25 + filter)

Metadata filtering depth

Managed-service availability

Pricing model

Best-fit deployment pattern

03 — Managed LeadersManaged leaders — Pinecone and Vertex Vector.

Cross-cloud production default

BigQuery + Vertex AI native

Cost at scale

04 — Open-SourceOpen-source — Qdrant, Weaviate, Milvus.

Rust-based · speed leader

Hybrid + GraphQL

Large-scale distributed

05 — Embedded + PostgresEmbedded + Postgres — Chroma and pgvector.

Cleanest developer experience

Postgres-integrated default

Both cap below dedicated DBs

06 — VespaVespa — the billion-scale hybrid leader.

Billion-scale production deployment

Vector + text + structured native

Operational complexity

07 — Reference WorkloadsFour reference workloads.

Small RAG (under 10M vectors, Postgres team)

Large RAG (10M-1B vectors, hybrid search)

Hybrid search at scale (1B+ vectors + text)

Agent-memory store (long-running, multi-tenant)

08 — ConclusionPick by data-platform commitment first.

There is no single best vector database. There are right defaults per data-platform commitment and scale tier.

Move past benchmark debates. Pick by data-platform commitment.

Vector-DB engagements

The questions we get every week.

Continue exploring agentic AI infrastructure.

Vector Databases for RAG: Complete Applications Guide

Vector Search & Embeddings Glossary: 2026 Reference

The MCP Adoption Wave: 6-Month Forecast Q2–Q3 2026