Vector databases moved from research curiosity to production necessity in 2023-2024. By 2026 the field has consolidated to eight production-grade options that dominate real AI-agent workloads. The decision dimensions are managed vs self-host, scale tier, hybrid-search depth, and the team's existing data-platform commitments — not headline benchmarks.
We compare eight databases across query latency, scale ceiling, hybrid search, metadata filtering, managed-service availability, and pricing model. Most teams pick by data-platform commitment (pgvector if Postgres-anchored, Pinecone if managed-cloud preference, Vertex if GCP-native) rather than aggregate benchmarks.
This post covers the 7-axis matrix, deep dives by category (managed leaders, open-source primaries, embedded + Postgres, large-scale hybrid), and four reference workloads we run for engineering teams today.
- 01Pick by data-platform commitment first; benchmarks are tie-breakers.If Postgres is the data platform, pgvector is the default — running a separate vector DB only justifies itself when scale or workload demands it. If managed-cloud is the preference, Pinecone is the default. If GCP, Vertex Vector. The team's existing platform commitments dominate the decision; ANN benchmarks tie-break between adequate options.
- 02Qdrant leads open-source speed — 10-25% faster than Weaviate or Milvus on common workloads.Qdrant's Rust implementation gives it the latency edge among open-source vector DBs. p99 latency at 10M vectors typically lands ~12ms vs Weaviate's ~16ms and Milvus's ~18ms. The gap matters at high QPS; less material at low query volumes. Right open-source pick when speed dominates.
- 03pgvector is the right default for ~70% of AI-agent workloads.If the workload is under 10M vectors, the team already runs Postgres, and queries don't need ultra-low latency, pgvector is the right default. Same backups, same operational tools, same access controls as the rest of the application data. Add a dedicated vector DB only when scale, hybrid search, or specialized features demand it.
- 04Hybrid search (vector + keyword) is the deciding feature for many production deployments.Pure vector search underperforms hybrid (vector + BM25 + metadata filters) on most production workloads — agents need exact-match for proper nouns, version numbers, IDs while still getting semantic matching. Weaviate, Vespa, and Qdrant ship hybrid-search natively. Pinecone added it; pgvector requires manual composition. For agent-memory and RAG over diverse content, hybrid search is non-optional.
- 05Scale tier matters: under 10M, anything works; 10M-1B, narrow choices; 1B+, Vespa or Milvus.Under 10M vectors, all eight databases perform adequately. Between 10M-1B, the field narrows to Pinecone (managed), Qdrant + Weaviate + Milvus (self-host), and Vespa. Above 1B vectors, Vespa and Milvus distributed deployments are the production-grade options. Pinecone scales but cost compounds. Match scale tier to platform; don't over-invest if you'll never cross 10M.
01 — The FieldThe 2026 vector-DB field.
The vector-database field consolidated rapidly. Eight databases now own the production conversation, split across four tiers: managed leaders (Pinecone, Vertex Vector), open-source primaries (Qdrant, Weaviate, Milvus), embedded + Postgres-integrated (Chroma, pgvector), and large-scale hybrid (Vespa). Each tier serves a different deployment shape; teams default into the tier that matches their existing data-platform commitments.
Pinecone — managed leader
Managed-cloud · pods + serverless · enterprise scaleThe managed-cloud default. Predictable performance, generous index sizes, hybrid search added in 2024-2025. Right pick when managed-cloud is the preference and the team values not running infrastructure.
ManagedQdrant — open-source speed leader
Rust-based · self-host or managed cloudThe Rust implementation gives Qdrant the latency edge among open-source vector DBs. Strong filtering, hybrid search, and quantization. Right OSS pick when speed dominates.
OSS speedWeaviate — hybrid + GraphQL
Open-source · GraphQL API · hybrid leaderWeaviate's hybrid-search story is among the field's strongest — vector + BM25 + metadata-filtering composition is native. GraphQL API differentiates from REST-first peers. Right pick for hybrid-search-heavy workloads.
Hybrid leaderMilvus — large-scale leader
Open-source · distributed · billion-scale capableMilvus distributed scales to billions of vectors. The production large-scale OSS choice. Operational complexity is real — pays back at scales where Pinecone cost compounds.
Large-scale OSSChroma — DX leader
Embedded + cloud · Python-first · prototype-friendlyCleanest DX for prototyping. Embedded mode runs in-process; cloud mode for production. Right pick when getting started fast matters more than production scale.
DX-firstpgvector — Postgres default
Postgres extension · runs anywhere · $0 add-onIf Postgres is the data platform, pgvector is the default. Same backups, same ops, same access. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated DB only when needed.
Postgres defaultVertex Vector Search — GCP-native
Managed-GCP · BigQuery integration · enterpriseGoogle Cloud's managed vector search. Right pick when the team is GCP-native and BigQuery integration matters. Pricing scales with index size + query volume.
GCP-nativeVespa — large-scale hybrid
Open-source · billions of vectors · text + vectorYahoo's open-source search engine. The production-grade pick for billion-scale hybrid search (vector + structured + text). Operational complexity matches the scale; pays back when scale demands it.
Massive scale02 — MatrixFeature matrix, eight databases.
The matrix below covers seven capabilities that drive 2026 vector-DB decisions: query latency at 10M vectors, scale ceiling, hybrid-search support, metadata filtering, managed-service availability, pricing model, and best-fit deployment pattern.
Query latency at 10M vectors (p99)
Qdrant ~12ms wins among OSS. Pinecone ~10-15ms managed. Weaviate ~16ms. Milvus ~18ms. pgvector ~25-40ms (depends on index type). Vertex ~12ms managed. Vespa ~15ms. Chroma ~30ms (not optimized for ultra-low latency). Picks differ at sub-10ms requirements.
Qdrant · PineconeScale ceiling (production-grade)
Vespa + Milvus distributed scale to billions cleanly. Pinecone scales high but cost compounds. Qdrant, Weaviate distributed are competitive. pgvector hits operational friction above ~10-50M depending on hardware. Chroma cloud is improving; embedded Chroma caps lower.
Vespa · Milvus (1B+) · PineconeHybrid search (vector + BM25 + filter)
Weaviate, Vespa lead with native hybrid composition. Qdrant added strong hybrid in 2024. Pinecone added hybrid; competitive. Milvus has hybrid via collections + filtering. pgvector requires manual composition with full-text search. Chroma simpler hybrid story.
Weaviate · Vespa · QdrantMetadata filtering depth
Qdrant has the strongest filter expressiveness (complex filter syntax, payload indexes). Weaviate strong via GraphQL. Pinecone solid. pgvector inherits Postgres's full SQL filtering — most expressive overall when SQL fits the workload. Milvus competitive.
pgvector (SQL) · Qdrant (filter syntax)Managed-service availability
Pinecone is managed-only. Vertex Vector is managed-GCP-only. Qdrant Cloud, Weaviate Cloud, Milvus Cloud (Zilliz) all available alongside self-host. Chroma cloud is generally available. pgvector via managed Postgres (Supabase, Neon, RDS, etc.). Vespa managed via Vespa Cloud.
Pinecone (managed-only)Pricing model
pgvector $0 (Postgres infra cost only). Chroma cloud generous free tier. Qdrant Cloud + Weaviate Cloud usage-based. Milvus / Zilliz cloud usage-based. Pinecone $70+/mo starter; serverless usage-based at scale. Vertex pay-per-query + index size. Vespa usage-based (cloud) or self-host.
pgvector (cheapest at scale)Best-fit deployment pattern
pgvector: Postgres-anchored teams under 10M vectors. Pinecone: managed-cloud preference, any scale. Qdrant: speed-sensitive OSS deployments. Weaviate: hybrid-search-heavy. Milvus: large-scale OSS. Chroma: prototypes + small-prod. Vertex: GCP-native. Vespa: billion-scale hybrid.
Match deployment pattern03 — Managed LeadersManaged leaders — Pinecone and Vertex Vector.
Pinecone and Vertex Vector Search are the managed-cloud leaders. Pinecone is the cross-cloud managed default; Vertex is the GCP-native option for teams committed to Google Cloud. Both remove infrastructure ops; both pay back when the team values not running its own vector DB.
Cross-cloud production default
The cross-cloud managed default. Pods + serverless tiers, generous index sizes, hybrid search, predictable performance. Right pick when managed-cloud preference dominates and AWS/Azure/GCP-agnostic deployment matters.
Cross-cloudBigQuery + Vertex AI native
Google Cloud's managed vector search. BigQuery integration, Vertex AI ecosystem fit, GCP IAM. Right pick when team is GCP-native and Vertex AI is the broader ML/AI stack. Pricing scales with index + query volume.
GCP-nativeCost at scale
Both managed services have meaningful cost at billion-scale workloads vs self-hosted alternatives (Milvus, Vespa). The cost is a service trade-off — pay more for managed simplicity. At 10M-100M vectors, the cost is competitive; above 1B, evaluate self-host.
Scale-cost trade"Pinecone is what most teams should default to. pgvector is what most teams should actually use, because most workloads are smaller than people think."— Internal vector-DB stack retro, March 2026
04 — Open-SourceOpen-source — Qdrant, Weaviate, Milvus.
Three open-source vector DBs anchor the production OSS conversation. Qdrant wins on speed (Rust implementation), Weaviate wins on hybrid search and GraphQL API ergonomics, Milvus wins on large-scale distributed deployments. All three have managed-cloud equivalents (Qdrant Cloud, Weaviate Cloud, Zilliz) for teams that want OSS code semantics with managed operations.
Rust-based · speed leader
Latency edge among OSS vector DBs. Strong filter syntax, hybrid search added in 2024, quantization for memory efficiency. Right OSS pick when speed and filter expressiveness dominate. Self-host or Qdrant Cloud.
Speed + filteringHybrid + GraphQL
Native hybrid (vector + BM25 + filter) composition. GraphQL API differentiates from REST-first peers. Right pick when hybrid search is the primary workload and GraphQL fits the team's API style.
Hybrid + GraphQLLarge-scale distributed
Distributed deployments scale to billions of vectors. Production large-scale OSS choice. Operational complexity matches the scale; pays back where Pinecone cost compounds. Zilliz cloud for managed equivalent.
Large-scale OSS05 — Embedded + PostgresEmbedded + Postgres — Chroma and pgvector.
Chroma and pgvector serve adjacent niches the dedicated vector DBs don't. Chroma wins on developer experience for prototyping (embedded mode runs in-process). pgvector wins on operational simplicity for Postgres-anchored teams (same data platform, same backups, same ops). Both are appropriate for ~70% of AI-agent workloads we see in the wild.
Cleanest developer experience
Embedded mode (in-process Python) for prototypes; cloud mode for production. Cleanest 'getting started' path among vector DBs. Right pick when prototype velocity dominates; less ideal for ultra-low latency or billion-scale workloads.
Prototype-firstPostgres-integrated default
If Postgres is the data platform, pgvector is the default vector store. Same backups, same operational tools, same access controls. Adequate for ~70% of AI-agent workloads (under 10M vectors). Add a dedicated vector DB only when scale or workload demands it.
Postgres defaultBoth cap below dedicated DBs
Chroma's embedded mode caps at small-prod scale; cloud mode scales but doesn't match dedicated DBs. pgvector hits operational friction above 10-50M vectors depending on hardware. Both are right defaults for under-10M; evaluate alternatives above that threshold.
Scale ceiling06 — VespaVespa — the billion-scale hybrid leader.
Vespa is the production-grade pick for billion-scale hybrid search — vector + structured + text in one engine. Yahoo's open-source search engine has the deepest hybrid-search depth in the field at scale. Operational complexity matches the scale; pays back when scale demands it.
Billion-scale production deployment
Vespa runs production search at Yahoo, Spotify, and similar scale-defining deployments. The scale ceiling is among the field's highest. Right pick when the workload is genuinely massive — vector counts in the billions or query volumes that overwhelm alternatives.
Massive scaleVector + text + structured native
Vespa was a search engine before vector search was a category. Hybrid composition (vector + BM25 + structured filtering) is native and deep. Right pick for any workload where hybrid search at scale matters most.
Hybrid depthOperational complexity
Vespa's operational complexity is real — schema configuration, content cluster + container topology, deployment workflows. Pays back at scale; doesn't pay back for sub-10M-vector workloads where Pinecone or pgvector serve better.
Ops-heavy07 — Reference WorkloadsFour reference workloads.
Below are the four AI-agent workloads we deploy most often, with the database recommendation that consistently wins on each. The mapping isn't absolute, but each pairing is the path of least friction.
Small RAG (under 10M vectors, Postgres team)
Most agency-grade RAG workloads. pgvector is the default — under 10M vectors, Postgres-anchored, same backups and ops as the rest of the application data. Don't reach for a dedicated DB unless scale or workload demands it.
pgvectorLarge RAG (10M-1B vectors, hybrid search)
Production RAG at scale with hybrid-search needs. Weaviate (open-source, hybrid native) or Pinecone (managed) are the right defaults. Qdrant if speed dominates and self-host fits. Match by managed-vs-OSS preference.
Weaviate · Pinecone · QdrantHybrid search at scale (1B+ vectors + text)
Massive-scale workloads where hybrid search and operational scale dominate. Vespa is the production-grade choice. Milvus distributed is the alternative. Pinecone scales but cost compounds.
Vespa · MilvusAgent-memory store (long-running, multi-tenant)
Agent-memory store needs metadata-rich filtering, multi-tenant isolation, and durable persistence. pgvector's SQL filtering shines here when scale fits. Qdrant strong if speed + filter syntax matter more. Pinecone for managed simplicity.
pgvector · Qdrant · Pinecone08 — ConclusionPick by data-platform commitment first.
There is no single best vector database. There are right defaults per data-platform commitment and scale tier.
By April 2026 the vector-database field has consolidated to eight production-grade options across four tiers. The decision dimensions that actually matter — managed vs self-host, scale tier, hybrid-search needs, existing data-platform commitments — outweigh aggregate ANN benchmarks for most teams. There is no "best" vector DB in the abstract; there is the right default for the deployment pattern.
The pattern that scales: pick by data-platform commitment first. Postgres team under 10M vectors → pgvector. Managed-cloud preference, any scale → Pinecone. GCP-native team → Vertex Vector. Hybrid-search-heavy → Weaviate. Speed-dominant OSS → Qdrant. Billion-scale hybrid → Vespa or Milvus. The benchmarks tie-break between adequate options once the platform commitment narrows the field.
The right move for most engineering teams: default to pgvector until scale or workload demands more. Most AI-agent RAG workloads are smaller than they feel; running a separate vector DB adds operational toil that often doesn't pay back. Reach for dedicated vector DBs when the workload genuinely needs what they offer — not before.