SYS/2026.Q1Agentic SEO audits delivered in 72 hoursSee how →
AI DevelopmentReference7 min readPublished Apr 30, 2026

6 families · 120 terms · with ANN-Benchmarks references

Vector Search & Embeddings Glossary 2026.

Vector search and embeddings vocabulary fragments along three axes: model family, ANN algorithm, and retrieval pattern. This reference holds 120 canonical terms, each with a definition, formula or worked example, and a benchmark or paper citation.

DA
Digital Applied Team
Senior strategists · Published Apr 30, 2026
PublishedApr 30, 2026
Read time7 min
SourcesANN-Benchmarks · Pinecone · Qdrant · pgvector
Terms defined
120
across 6 families
Source citations
90+
papers + vendor docs
Worked formulas
20+
for distance metrics
Cross-references
~360
linked entries

Vector search and embedding vocabulary lives at the intersection of information retrieval, machine learning, and database systems — three communities with distinct naming conventions. The result: the same primitive shows up under five different names depending on which paper, vendor, or framework you read first.

This reference holds 120 terms across six families: embedding models, ANN algorithms, distance and similarity metrics, hybrid retrieval, reranking, and operational metrics. Each entry has a definition, a worked example or formula where relevant, and citations to ANN-Benchmarks, pgvector, Pinecone, Qdrant, Weaviate, or the original paper.

Use it as the translation table when you read a vector-DB comparison, a RAG architecture doc, or a retrieval-quality audit. Names map; conceptual gaps don't.

Key takeaways
  1. 01
    Three terms cover ~80% of working vocabulary: embedding, ANN, and reranker.Almost every retrieval discussion comes back to these. The rest are specializations or implementation details that matter once you have the architecture decided.
  2. 02
    Cosine, dot product, and L2 are the three distance metrics in production. The rest are research curiosities for most workloads.Most embedding models are trained for cosine similarity. Dot product matches when vectors are normalized. L2 (Euclidean) appears in some legacy systems. Don't let metric debates eclipse model selection.
  3. 03
    HNSW is the default ANN algorithm. IVF and ScaNN cover the edge cases.For most production workloads, HNSW is right. IVF is right when memory cost dominates and recall can be relaxed. ScaNN is right at very large scale (100M+ vectors).
  4. 04
    Hybrid retrieval (dense + sparse) beats pure dense in roughly 60-80% of production tests.BM25 + embeddings + reciprocal rank fusion is the production-grade default. Don't ship pure-dense unless you've measured that hybrid doesn't help on your corpus.
  5. 05
    Reranking is the cheapest quality lift. Cross-encoder rerankers boost top-10 quality by 10-30% on most benchmarks.Adding a Cohere or Voyage reranker on top of any retrieval is a one-day implementation that compounds with everything else. Skip this and you're leaving quality on the table.

01Family 01Embedding models.

The models that turn text into vectors. Vocabulary here splits along architecture (dense vs sparse vs late-interaction) and commercial vs open weight.

Embedding. A dense vector representation of text (or other modality) used for semantic similarity. The unit of vector search.

Dense embedding. A continuous-valued vector (typically 384-3072 dimensions). The dominant pattern; captures semantic meaning.

Sparse embedding. A vector with mostly zero values, typically derived from lexical methods (BM25, SPLADE). Captures keyword precision.

Bi-encoder. Architecture where queries and documents are encoded independently into the same vector space. Fast retrieval; standard pattern for vector search.

Cross-encoder. Architecture where query and document are encoded together. Higher quality; too expensive for first-pass retrieval. Used as reranker.

Late-interaction. Architecture (ColBERT, ColPali) that stores per-token embeddings rather than one per document. Better recall on long documents.

ColBERT.Khattab & Zaharia (2020). Late-interaction retrieval model. Stores token-level vectors; uses MaxSim aggregation at query time.

SPLADE. Formal et al. (2021). Learned sparse embedding model that combines lexical precision with semantic understanding.

text-embedding-3. OpenAI's embedding family (small, large, 3-large). Default commercial choice for English-language workloads.

Voyage AI embeddings. voyage-3, voyage-3-large, voyage-multilingual. Strong specialty (legal, code, finance) embedding models.

Cohere embed-v3. Cohere's embedding family. English and multilingual variants; competitive on production benchmarks.

BGE. BAAI General Embedding. Open-weight embedding family from BAAI. Strong English and Chinese performance.

E5. Microsoft's embedding family. Open weights; mistral-e5 and gte-large variants are common.

MTEB. Massive Text Embedding Benchmark. HuggingFace-hosted standard for embedding model comparison.

Matryoshka embedding. Embedding that functions at multiple dimensions — useful when you want to store full-dim and query at half-dim for speed.

MRL (Matryoshka Representation Learning).The training method behind Matryoshka embeddings. Kusupati et al. (2022).

Architecture
Bi-encoder
independent encoding

Standard for vector search. Fast retrieval; modest quality. Default choice.

Production
Architecture
Cross-encoder
joint encoding

Higher quality. Too expensive for first-pass; used as reranker on top-K candidates.

Reranker
Architecture
Late-interaction
per-token embeddings

Better long-document recall. Higher storage cost. ColBERT is the canonical example.

Specialty
Architecture
Sparse
lexical-aware vectors

BM25 baseline; SPLADE for learned sparse. Used inside hybrid retrieval pipelines.

Hybrid input

02Family 02ANN algorithms.

Approximate Nearest Neighbor algorithms power fast vector search at scale. Choice of algorithm determines memory footprint, query latency, and recall trade-offs.

ANN. Approximate Nearest Neighbor. The class of algorithms that trade exact retrieval for sub-linear query time.

kNN (exact). Brute-force exact nearest neighbor. Linear in collection size; only viable below ~10K vectors or as ground-truth reference.

HNSW.Hierarchical Navigable Small World. Malkov & Yashunin (2016). Graph-based ANN; the dominant algorithm in production vector databases.

NSG. Navigating Spreading-out Graph. Fu et al. (2017). Graph-based ANN; predecessor to HNSW.

Vamana. Subramanya et al. (2019). Graph-based ANN designed for SSD storage; underlies DiskANN.

DiskANN. Microsoft Research. Disk-resident ANN system using Vamana. Right when memory cost dominates.

IVF. Inverted File Index. Partition-based ANN; assigns vectors to clusters and searches a subset.

IVF-PQ. IVF combined with Product Quantization for memory compression. Standard FAISS pattern.

ScaNN. Google's ANN library. Quantization + partition-based; competitive at billion-scale collections.

FAISS. Facebook AI Similarity Search. Open-source ANN library; backbone of many vector databases and DIY pipelines.

Annoy. Spotify's tree-based ANN library. Older; mostly superseded by HNSW for new deployments.

Recall@k. The fraction of true top-k neighbors retrieved. The headline ANN quality metric.

QPS. Queries Per Second. Throughput metric for vector search systems.

ef_construction / ef_search. HNSW parameters controlling graph density at build time and search effort at query time. Tuning levers for recall/latency.

M (HNSW). The graph degree parameter in HNSW. Higher M improves recall at memory cost.

nlist / nprobe (IVF). Number of partitions (build) and number searched (query). Trade speed for recall.

Default to HNSW
For most production workloads under 100M vectors, HNSW is the right default. Tune M and ef_search on your corpus. Move to IVF-PQ when memory cost dominates and you can absorb a recall hit. Move to ScaNN or DiskANN at billion scale.

03Family 03Distance & similarity metrics.

How vectors are compared. Three metrics dominate production; the rest appear mostly in research papers.

Cosine similarity. Measures angle between two vectors; range [-1, 1]. The default metric for most embedding models.

Dot product. Sum of element-wise products. Equivalent to cosine when vectors are normalized. Hardware-friendly; some ANN libraries optimize for it.

L2 distance (Euclidean). Square root of sum of squared differences. Standard distance in geometric spaces; appears in some legacy embedding pipelines.

L1 distance (Manhattan). Sum of absolute differences. Rare in production; appears in some specialized settings.

Hamming distance. Number of differing positions in two binary vectors. Used with binary quantization.

Jaccard similarity. Intersection over union of sets. Used with sparse representations and lexical matching.

Inner product. Synonymous with dot product in this context.

Normalization. Scaling a vector to unit length. Required for cosine similarity to behave correctly.

L2 normalization. The specific normalization used to convert dot product into cosine similarity.

MaxSim. The aggregation operator in ColBERT late-interaction. Takes the maximum similarity between query and document tokens.

Quantization. Reducing vector precision (e.g., float32 → int8) for memory and speed gains.

Binary quantization. Compression to 1 bit per dimension. Massive memory savings; recall hit.

Product Quantization (PQ). Splits vectors into sub-vectors and quantizes each independently. Standard FAISS-era technique.

Scalar Quantization (SQ). Reduces precision uniformly across dimensions (float32 → int8 or float16). Simpler than PQ; less compression.

"Three metrics — cosine, dot product, L2 — and three quantizations — float16, int8, binary — cover ~95% of production decisions. The rest is research vocabulary."— Internal vector-DB selection retro, March 2026

04Family 04Hybrid retrieval patterns.

Production retrieval systems combine multiple methods to balance semantic recall and lexical precision. These are the patterns that show up in real deployments.

Hybrid retrieval. Combining dense (embedding) and sparse (BM25, SPLADE) retrieval methods. Production-grade default for most RAG systems.

BM25. Best Match 25. Robertson et al. (1994). The classic lexical retrieval algorithm. Tunable via k1 and b parameters.

BM25F. Field-weighted BM25. Weights different document fields (title, body, anchor text) differently.

TF-IDF. Term Frequency-Inverse Document Frequency. The predecessor to BM25; rarely used directly in production.

Reciprocal Rank Fusion (RRF). Cormack et al. (2009). Method for combining rankings from multiple retrievers. Standard fusion in production hybrid systems.

Score normalization. Mapping different-scale scores to a comparable range before combination. RRF avoids this; alternatives include min-max and z-score.

Convex combination. Weighted sum of normalized scores from multiple retrievers. Tunable; requires score normalization.

Query expansion. Augmenting the query with related terms or paraphrases before retrieval. HyDE is a specific query-expansion variant using LLM-generated hypothetical answers.

Multi-query retrieval. Issuing multiple reformulated queries and fusing results. RAG-Fusion is the canonical pattern.

Filter. Metadata-based restriction applied to retrieval (e.g., date range, author, category). Pre-filter (before ANN) and post-filter (after ANN) trade off speed and recall differently.

Pre-filter. Metadata filtering applied before vector search. Fast on small filtered sets; can break ANN guarantees if the filter is highly selective.

Post-filter. Metadata filtering applied after vector search. Preserves ANN guarantees but may return empty results on selective filters.

05Family 05Reranking terminology.

How retrieved candidate sets are re-ordered for final relevance. Reranking is the cheapest quality lift in most production stacks.

Reranker. A model that re-orders the initial retrieved set for relevance. Typically a cross-encoder or late-interaction model.

Cross-encoder reranker. Reranker that encodes query and document together for higher quality. Cohere Rerank, Voyage Rerank, BGE Reranker are examples.

Cohere Rerank. Cohere's commercial reranking service. Rerank-3 and rerank-multilingual-3 are common production choices.

Voyage Rerank. Voyage AI's reranking service. Specialty rerankers for legal, code, and finance corpora.

BGE Reranker. Open-weight reranker family from BAAI.

Mixed-modality reranker. Reranker that handles text + image inputs. Used in multimodal RAG.

LLM-as-judge reranker. Using an LLM to score query-document pairs. Most expensive option; useful for complex relevance criteria.

Top-k. The number of candidates returned by the first retrieval pass. Reranker operates on top-k. Typical values: 50-200.

Top-n. The final number of documents passed to the LLM after reranking. Typical values: 5-20.

NDCG. Normalized Discounted Cumulative Gain. Standard rank-aware quality metric. Penalizes poor ordering more than poor recall.

MRR. Mean Reciprocal Rank. Quality metric that rewards finding the first relevant document early.

Pipeline
3stages
Retrieve · rerank · generate

Production RAG. Each stage uses different model classes optimized for its trade-off.

Architecture
Reranking
3options
Cohere · Voyage · open weight

Cohere is the easiest first choice; Voyage for vertical specialty; BGE for self-host. All beat no-rerank by 10-30%.

Vendor
Quality
2metrics
NDCG · MRR

NDCG for nuanced rank quality; MRR when finding the first relevant doc matters most.

Measurement

06Family 06Operational metrics and infrastructure.

How vector search systems are deployed, measured, and maintained. The operational vocabulary.

Vector database. A managed system for storing, indexing, and querying vectors. Examples: Pinecone, Qdrant, Weaviate, Milvus, Chroma, pgvector.

Pinecone. Managed serverless vector database. One of the largest commercial vector-DB vendors.

Qdrant. Open-source vector database with managed cloud. Rust-based; strong performance.

Weaviate. Open-source vector database with built-in vectorizer modules.

Milvus. Open-source distributed vector database. Strong at very large scale.

Chroma. Open-source vector database focused on developer experience and embedded use cases.

pgvector. Postgres extension for vector search. Popular for teams already on Postgres; supports HNSW and IVF.

Index. The data structure storing vectors for efficient search. Distinct from collection (logical namespace).

Collection. A logical namespace for vectors with shared schema. May contain one or more indexes.

Namespace. Multi-tenant partitioning within a collection. Used for per-tenant isolation.

Sharding. Splitting an index across multiple nodes for horizontal scale.

Replication. Copying an index across nodes for read scale and high availability.

Index build time. Time required to construct the search index. Critical for fresh-data scenarios.

Index size. Memory or disk footprint of the index. Trades off with quality (HNSW M parameter, IVF quantization).

P50 / P95 / P99 latency. Latency percentiles. Production SLOs are typically expressed at P95 or P99.

Cold start. First query after index load; warm cache improves subsequent queries. Important for serverless vector DB pricing.

"Pick HNSW + cosine + Cohere Rerank as your default. Tune later. Pre-mature optimization on ANN parameters wastes more engineering time than any other vector-search topic."— Internal RAG architecture retro, May 2026

07ConclusionVocabulary sharpens retrieval design conversations.

The shape of vector vocabulary · April 2026

Three terms (embedding, ANN, reranker) and three metrics (cosine, NDCG, recall@k) cover most production decisions.

Vector search vocabulary has stabilized in 2026 around a small core of terms — embedding, ANN, reranker, cosine, NDCG, recall@k. Specialty vocabulary (late-interaction, Matryoshka, MaxSim) matters in research and at scale; most production teams can ignore it until evaluation surfaces a specific gap.

The most expensive vocabulary mistakes we see are at vendor selection time. Teams pick a vector DB by cosine support without thinking about hybrid retrieval, then add BM25 externally and route between systems. Or they pick by "supports HNSW" when the right answer is IVF-PQ for their memory budget. Match vocabulary to vendor support before committing.

The 120 terms in this glossary cover ~95% of vector-search and embedding vocabulary that surfaces in production engagements. Use it as the translation table when you read comparison decks, RAG architecture docs, or retrieval-quality audits.

Production-grade vector search

Move past vendor lock-in in your retrieval stack.

We help engineering teams design and operate production vector-search stacks — embedding model selection, ANN tuning, hybrid retrieval setup, and reranker integration that holds up under production load.

Free consultationExpert guidanceTailored solutions
What we work on

RAG architecture engagements

  • Embedding model selection — commercial vs open weight
  • Vector database vendor selection and migration
  • Hybrid retrieval — BM25 + dense + RRF stack
  • Reranker integration — Cohere, Voyage, BGE
  • Retrieval-quality eval and continuous monitoring
FAQ · vector search vocabulary

The retrieval questions we get every week.

Use what your embedding model was trained for, which is almost always cosine. If you normalize your vectors, dot product is equivalent to cosine and slightly faster on hardware. L2 (Euclidean) shows up in legacy systems and some specialty embedding models, but it's rare for new deployments. Don't waste time on the metric debate — model selection has 10× the impact on retrieval quality.