Vector search and embedding vocabulary lives at the intersection of information retrieval, machine learning, and database systems — three communities with distinct naming conventions. The result: the same primitive shows up under five different names depending on which paper, vendor, or framework you read first.

This reference holds 120 terms across six families: embedding models, ANN algorithms, distance and similarity metrics, hybrid retrieval, reranking, and operational metrics. Each entry has a definition, a worked example or formula where relevant, and citations to ANN-Benchmarks, pgvector, Pinecone, Qdrant, Weaviate, or the original paper.

Use it as the translation table when you read a vector-DB comparison, a RAG architecture doc, or a retrieval-quality audit. Names map; conceptual gaps don't.

Key takeaways

01
Three terms cover ~80% of working vocabulary: embedding, ANN, and reranker.Almost every retrieval discussion comes back to these. The rest are specializations or implementation details that matter once you have the architecture decided.
02
Cosine, dot product, and L2 are the three distance metrics in production. The rest are research curiosities for most workloads.Most embedding models are trained for cosine similarity. Dot product matches when vectors are normalized. L2 (Euclidean) appears in some legacy systems. Don't let metric debates eclipse model selection.
03
HNSW is the default ANN algorithm. IVF and ScaNN cover the edge cases.For most production workloads, HNSW is right. IVF is right when memory cost dominates and recall can be relaxed. ScaNN is right at very large scale (100M+ vectors).
04
Hybrid retrieval (dense + sparse) beats pure dense in roughly 60-80% of production tests.BM25 + embeddings + reciprocal rank fusion is the production-grade default. Don't ship pure-dense unless you've measured that hybrid doesn't help on your corpus.
05
Reranking is the cheapest quality lift. Cross-encoder rerankers boost top-10 quality by 10-30% on most benchmarks.Adding a Cohere or Voyage reranker on top of any retrieval is a one-day implementation that compounds with everything else. Skip this and you're leaving quality on the table.

01 — Family 01Embedding models.

The models that turn text into vectors. Vocabulary here splits along architecture (dense vs sparse vs late-interaction) and commercial vs open weight.

Embedding. A dense vector representation of text (or other modality) used for semantic similarity. The unit of vector search.

Dense embedding. A continuous-valued vector (typically 384-3072 dimensions). The dominant pattern; captures semantic meaning.

Sparse embedding. A vector with mostly zero values, typically derived from lexical methods (BM25, SPLADE). Captures keyword precision.

Bi-encoder. Architecture where queries and documents are encoded independently into the same vector space. Fast retrieval; standard pattern for vector search.

Cross-encoder. Architecture where query and document are encoded together. Higher quality; too expensive for first-pass retrieval. Used as reranker.

Late-interaction. Architecture (ColBERT, ColPali) that stores per-token embeddings rather than one per document. Better recall on long documents.

ColBERT. Khattab & Zaharia (2020). Late-interaction retrieval model. Stores token-level vectors; uses MaxSim aggregation at query time.

SPLADE. Formal et al. (2021). Learned sparse embedding model that combines lexical precision with semantic understanding.

text-embedding-3. OpenAI's embedding family (small, large, 3-large). Default commercial choice for English-language workloads.

Voyage AI embeddings. voyage-3, voyage-3-large, voyage-multilingual. Strong specialty (legal, code, finance) embedding models.

Cohere embed-v3. Cohere's embedding family. English and multilingual variants; competitive on production benchmarks.

BGE. BAAI General Embedding. Open-weight embedding family from BAAI. Strong English and Chinese performance.

E5. Microsoft's embedding family. Open weights; mistral-e5 and gte-large variants are common.

MTEB. Massive Text Embedding Benchmark. HuggingFace-hosted standard for embedding model comparison.

Matryoshka embedding. Embedding that functions at multiple dimensions — useful when you want to store full-dim and query at half-dim for speed.

MRL (Matryoshka Representation Learning).The training method behind Matryoshka embeddings. Kusupati et al. (2022).

Architecture

Bi-encoder

independent encoding

Standard for vector search. Fast retrieval; modest quality. Default choice.

Production

Architecture

Cross-encoder

joint encoding

Higher quality. Too expensive for first-pass; used as reranker on top-K candidates.

Reranker

Architecture

Late-interaction

per-token embeddings

Better long-document recall. Higher storage cost. ColBERT is the canonical example.

Specialty

Architecture

Sparse

lexical-aware vectors

BM25 baseline; SPLADE for learned sparse. Used inside hybrid retrieval pipelines.

Hybrid input

02 — Family 02ANN algorithms.

Approximate Nearest Neighbor algorithms power fast vector search at scale. Choice of algorithm determines memory footprint, query latency, and recall trade-offs.

ANN. Approximate Nearest Neighbor. The class of algorithms that trade exact retrieval for sub-linear query time.

kNN (exact). Brute-force exact nearest neighbor. Linear in collection size; only viable below ~10K vectors or as ground-truth reference.

HNSW. Hierarchical Navigable Small World. Malkov & Yashunin (2016). Graph-based ANN; the dominant algorithm in production vector databases.

NSG. Navigating Spreading-out Graph. Fu et al. (2017). Graph-based ANN; predecessor to HNSW.

Vamana. Subramanya et al. (2019). Graph-based ANN designed for SSD storage; underlies DiskANN.

DiskANN. Microsoft Research. Disk-resident ANN system using Vamana. Right when memory cost dominates.

IVF. Inverted File Index. Partition-based ANN; assigns vectors to clusters and searches a subset.

IVF-PQ. IVF combined with Product Quantization for memory compression. Standard FAISS pattern.

ScaNN. Google's ANN library. Quantization + partition-based; competitive at billion-scale collections.

FAISS. Facebook AI Similarity Search. Open-source ANN library; backbone of many vector databases and DIY pipelines.

Annoy. Spotify's tree-based ANN library. Older; mostly superseded by HNSW for new deployments.

Recall@k. The fraction of true top-k neighbors retrieved. The headline ANN quality metric.

QPS. Queries Per Second. Throughput metric for vector search systems.

ef_construction / ef_search. HNSW parameters controlling graph density at build time and search effort at query time. Tuning levers for recall/latency.

M (HNSW). The graph degree parameter in HNSW. Higher M improves recall at memory cost.

nlist / nprobe (IVF). Number of partitions (build) and number searched (query). Trade speed for recall.

Default to HNSW

For most production workloads under 100M vectors, HNSW is the right default. Tune M and ef_search on your corpus. Move to IVF-PQ when memory cost dominates and you can absorb a recall hit. Move to ScaNN or DiskANN at billion scale.

03 — Family 03Distance & similarity metrics.

How vectors are compared. Three metrics dominate production; the rest appear mostly in research papers.

Cosine similarity. Measures angle between two vectors; range [-1, 1]. The default metric for most embedding models.

Dot product. Sum of element-wise products. Equivalent to cosine when vectors are normalized. Hardware-friendly; some ANN libraries optimize for it.

L2 distance (Euclidean). Square root of sum of squared differences. Standard distance in geometric spaces; appears in some legacy embedding pipelines.

L1 distance (Manhattan). Sum of absolute differences. Rare in production; appears in some specialized settings.

Hamming distance. Number of differing positions in two binary vectors. Used with binary quantization.

Jaccard similarity. Intersection over union of sets. Used with sparse representations and lexical matching.

Inner product. Synonymous with dot product in this context.

Normalization. Scaling a vector to unit length. Required for cosine similarity to behave correctly.

L2 normalization. The specific normalization used to convert dot product into cosine similarity.

MaxSim. The aggregation operator in ColBERT late-interaction. Takes the maximum similarity between query and document tokens.

Quantization. Reducing vector precision (e.g., float32 → int8) for memory and speed gains.

Binary quantization. Compression to 1 bit per dimension. Massive memory savings; recall hit.

Product Quantization (PQ). Splits vectors into sub-vectors and quantizes each independently. Standard FAISS-era technique.

Scalar Quantization (SQ). Reduces precision uniformly across dimensions (float32 → int8 or float16). Simpler than PQ; less compression.

"Three metrics — cosine, dot product, L2 — and three quantizations — float16, int8, binary — cover ~95% of production decisions. The rest is research vocabulary."— Internal vector-DB selection retro, March 2026

04 — Family 04Hybrid retrieval patterns.

Production retrieval systems combine multiple methods to balance semantic recall and lexical precision. These are the patterns that show up in real deployments.

Hybrid retrieval. Combining dense (embedding) and sparse (BM25, SPLADE) retrieval methods. Production-grade default for most RAG systems.

BM25. Best Match 25. Robertson et al. (1994). The classic lexical retrieval algorithm. Tunable via k1 and b parameters.

BM25F. Field-weighted BM25. Weights different document fields (title, body, anchor text) differently.

TF-IDF. Term Frequency-Inverse Document Frequency. The predecessor to BM25; rarely used directly in production.

Reciprocal Rank Fusion (RRF). Cormack et al. (2009). Method for combining rankings from multiple retrievers. Standard fusion in production hybrid systems.

Score normalization. Mapping different-scale scores to a comparable range before combination. RRF avoids this; alternatives include min-max and z-score.

Convex combination. Weighted sum of normalized scores from multiple retrievers. Tunable; requires score normalization.

Query expansion. Augmenting the query with related terms or paraphrases before retrieval. HyDE is a specific query-expansion variant using LLM-generated hypothetical answers.

Multi-query retrieval. Issuing multiple reformulated queries and fusing results. RAG-Fusion is the canonical pattern.

Filter. Metadata-based restriction applied to retrieval (e.g., date range, author, category). Pre-filter (before ANN) and post-filter (after ANN) trade off speed and recall differently.

Pre-filter. Metadata filtering applied before vector search. Fast on small filtered sets; can break ANN guarantees if the filter is highly selective.

Post-filter. Metadata filtering applied after vector search. Preserves ANN guarantees but may return empty results on selective filters.

05 — Family 05Reranking terminology.

How retrieved candidate sets are re-ordered for final relevance. Reranking is the cheapest quality lift in most production stacks.

Reranker. A model that re-orders the initial retrieved set for relevance. Typically a cross-encoder or late-interaction model.

Cross-encoder reranker. Reranker that encodes query and document together for higher quality. Cohere Rerank, Voyage Rerank, BGE Reranker are examples.

Cohere Rerank. Cohere's commercial reranking service. Rerank-3 and rerank-multilingual-3 are common production choices.

Voyage Rerank. Voyage AI's reranking service. Specialty rerankers for legal, code, and finance corpora.

BGE Reranker. Open-weight reranker family from BAAI.

Mixed-modality reranker. Reranker that handles text + image inputs. Used in multimodal RAG.

LLM-as-judge reranker. Using an LLM to score query-document pairs. Most expensive option; useful for complex relevance criteria.

Top-k. The number of candidates returned by the first retrieval pass. Reranker operates on top-k. Typical values: 50-200.

Top-n. The final number of documents passed to the LLM after reranking. Typical values: 5-20.

NDCG. Normalized Discounted Cumulative Gain. Standard rank-aware quality metric. Penalizes poor ordering more than poor recall.

MRR. Mean Reciprocal Rank. Quality metric that rewards finding the first relevant document early.

Pipeline

Retrieve · rerank · generate

3stages

Production RAG. Each stage uses different model classes optimized for its trade-off.

Architecture

Reranking

Cohere · Voyage · open weight

3options

Cohere is the easiest first choice; Voyage for vertical specialty; BGE for self-host. All beat no-rerank by 10-30%.

Vendor

Quality

NDCG · MRR

2metrics

NDCG for nuanced rank quality; MRR when finding the first relevant doc matters most.

Measurement

06 — Family 06Operational metrics and infrastructure.

How vector search systems are deployed, measured, and maintained. The operational vocabulary.

Vector database. A managed system for storing, indexing, and querying vectors. Examples: Pinecone, Qdrant, Weaviate, Milvus, Chroma, pgvector.

Pinecone. Managed serverless vector database. One of the largest commercial vector-DB vendors.

Qdrant. Open-source vector database with managed cloud. Rust-based; strong performance.

Weaviate. Open-source vector database with built-in vectorizer modules.

Milvus. Open-source distributed vector database. Strong at very large scale.

Chroma. Open-source vector database focused on developer experience and embedded use cases.

pgvector. Postgres extension for vector search. Popular for teams already on Postgres; supports HNSW and IVF.

Index. The data structure storing vectors for efficient search. Distinct from collection (logical namespace).

Collection. A logical namespace for vectors with shared schema. May contain one or more indexes.

Namespace. Multi-tenant partitioning within a collection. Used for per-tenant isolation.

Sharding. Splitting an index across multiple nodes for horizontal scale.

Replication. Copying an index across nodes for read scale and high availability.

Index build time. Time required to construct the search index. Critical for fresh-data scenarios.

Index size. Memory or disk footprint of the index. Trades off with quality (HNSW M parameter, IVF quantization).

P50 / P95 / P99 latency. Latency percentiles. Production SLOs are typically expressed at P95 or P99.

Cold start. First query after index load; warm cache improves subsequent queries. Important for serverless vector DB pricing.

"Pick HNSW + cosine + Cohere Rerank as your default. Tune later. Pre-mature optimization on ANN parameters wastes more engineering time than any other vector-search topic."— Internal RAG architecture retro, May 2026

07 — ConclusionVocabulary sharpens retrieval design conversations.

The shape of vector vocabulary · April 2026

Three terms (embedding, ANN, reranker) and three metrics (cosine, NDCG, recall@k) cover most production decisions.

Vector search vocabulary has stabilized in 2026 around a small core of terms — embedding, ANN, reranker, cosine, NDCG, recall@k. Specialty vocabulary (late-interaction, Matryoshka, MaxSim) matters in research and at scale; most production teams can ignore it until evaluation surfaces a specific gap.

The most expensive vocabulary mistakes we see are at vendor selection time. Teams pick a vector DB by cosine support without thinking about hybrid retrieval, then add BM25 externally and route between systems. Or they pick by "supports HNSW" when the right answer is IVF-PQ for their memory budget. Match vocabulary to vendor support before committing.

The 120 terms in this glossary cover ~95% of vector-search and embedding vocabulary that surfaces in production engagements. Use it as the translation table when you read comparison decks, RAG architecture docs, or retrieval-quality audits.

Vector Search & Embeddings Glossary 2026.

01 — Family 01Embedding models.

Bi-encoder

Cross-encoder

Late-interaction

Sparse

02 — Family 02ANN algorithms.

03 — Family 03Distance & similarity metrics.

04 — Family 04Hybrid retrieval patterns.

05 — Family 05Reranking terminology.

Retrieve · rerank · generate

Cohere · Voyage · open weight

NDCG · MRR

06 — Family 06Operational metrics and infrastructure.

07 — ConclusionVocabulary sharpens retrieval design conversations.

Three terms (embedding, ANN, reranker) and three metrics (cosine, NDCG, recall@k) cover most production decisions.

Move past vendor lock-in in your retrieval stack.

RAG architecture engagements

The retrieval questions we get every week.

Continue exploring retrieval references.

Hybrid Search: BM25, Vector & Reranking Reference 2026

RAG Chunking Strategies: A 2026 Retrieval Playbook

Agentic AI Glossary: 200 Essential Terms for 2026

MCP & Tool-Use Vocabulary: 2026 Reference Guide