AI marketing acronyms multiply faster than the underlying techniques. Every paper introduces three; every vendor coins two more for their product brief; every analyst writes another. By the time a CMO reads three decks they've seen a hundred acronyms — and silently translated each.

This master list decodes 250+ across nine families: search and visibility, retrieval and RAG, training and alignment, architecture, reasoning and prompting, agents and tool use, inference and runtime, and ops and deployment. Each entry expands the acronym, gives a one-line definition, a worked example, and a citation to the source paper or vendor doc.

Use this as a translation table when you read vendor decks or pre-call briefs. Most acronyms collapse into the same primitive once you decode them; the rest fall into discrete categories (training methods, retrieval variants) that are easy to keep straight once you have the reference.

Key takeaways

01
Eight acronyms account for ~70% of executive vocabulary: RAG, MCP, GEO, AEO, MoE, CoT, RLHF, ICL.These are the high-frequency terms in board decks and analyst briefings. The rest are domain-specific; surface them when relevant but don't expect cross-team familiarity.
02
Training acronyms (RLHF, DPO, ORPO, KTO, RLAIF) all describe variants of preference-alignment.Different optimization formulations for the same goal — making models follow instructions and refuse appropriately. The differences matter to ML teams, less so to executive audiences.
03
Retrieval acronyms split into three families: dense (RAG, HyDE), structured (GraphRAG, KAG), and hybrid (CRAG, RAG-Fusion).Naming the family before the variant prevents the 'is this the same as RAG?' rabbit hole that derails most retrieval discussions.
04
Architecture acronyms (MoE, MoA, MQA, GQA) describe efficiency trade-offs. They affect cost; they do not change capability ceilings.A MoE model and a dense model can have similar capabilities at very different costs. Track both metrics; the architecture acronym tells you cost behavior, not quality.
05
When a vendor introduces a new acronym, ask which existing one it generalizes or specializes.Most new acronyms are minor variants of canonical ones. Locating the parent term keeps the conversation grounded.

01 — Family 01Search & visibility acronyms.

The acronyms that govern how brands measure presence in search and AI-search surfaces. Most have stabilized in the past two years.

SEO. Search Engine Optimization. The original discipline — optimize for ranked blue-link results.

SEM. Search Engine Marketing. Paid search + SEO; sometimes used synonymously with paid search.

GEO. Generative Engine Optimization. Optimization for citation inside AI-generated answers. Aggarwal et al. (2024) coined the term in the Princeton GEO paper.

AEO. Answer Engine Optimization. Used interchangeably with GEO by some analysts; functionally the same discipline.

AIO. AI Overview. Google's AI-generated answer surface above organic results.

SERP. Search Engine Results Page. The page of results returned for a query.

SGE. Search Generative Experience. Google's earlier name for AIO; deprecated in mid-2024.

CTR. Click-Through Rate. Fraction of impressions that result in a click. Compressed in AIO-eligible queries.

CTS. Click-Through to Source. Fraction of AI answer impressions that result in a click on a cited source.

SoV. Share of Voice. Brand presence relative to competitors. The PR-era predecessor of citation share.

E-E-A-T. Experience, Expertise, Authoritativeness, Trustworthiness. Google's quality signal framework (added "Experience" in late 2022).

UGC. User-Generated Content. Reviews, forum posts, Reddit threads — heavily weighted in AI-search citations.

Surface

GEO

AI answer · cited source

Optimize to be cited inside AI answers. Citation share is the metric.

Aggarwal 2024

Surface

AEO

Answer engines · brand mention

Used interchangeably with GEO. Same discipline; different brand naming.

Industry alias

Surface

AIO

Google AI Overview snippet

Specific Google surface. Subset of GEO; largest by user volume.

Google product

Surface

SEO

Ranked organic results

Original discipline. Still relevant; underlies most signals AI engines use.

Foundation

02 — Family 02Retrieval & RAG acronyms.

How AI systems pull relevant information into context before answering. RAG is the foundation; the variants below specialize for different content types and quality goals.

RAG. Retrieval-Augmented Generation. Lewis et al. (2020). The canonical pattern: retrieve relevant documents, then generate an answer grounded in them.

CRAG. Corrective RAG. Yan et al. (2024). Adds a self-correction step that grades retrieved context and re-retrieves on low quality.

RAG-Fusion. Variant that issues multiple query reformulations and fuses results via reciprocal rank fusion. Improves recall at modest cost.

HyDE. Hypothetical Document Embeddings. Gao et al. (2022). The model generates a hypothetical answer first, embeds that, then retrieves similar real documents.

GraphRAG. Microsoft Research (2024). RAG over a knowledge graph constructed from source documents. Better for cross-document reasoning.

KAG. Knowledge-Augmented Generation. Variant that augments retrieval with structured knowledge sources (databases, APIs).

SAR. Search-Augmented Reasoning. Newer term for agents that issue search queries as part of their reasoning process.

BM25. Best Match 25. Robertson et al. (1994). The classic lexical-retrieval algorithm; baseline in nearly every hybrid retrieval system.

RRF. Reciprocal Rank Fusion. Cormack et al. (2009). Method for combining results from multiple retrievers into a single ranking.

ANN. Approximate Nearest Neighbor. The class of algorithms used for fast vector retrieval over large collections.

HNSW. Hierarchical Navigable Small World. Malkov & Yashunin (2016). The dominant ANN algorithm in production vector databases.

IVF. Inverted File Index. ANN algorithm family that partitions vectors into cells. Used by FAISS and many production systems.

PQ. Product Quantization. Compression technique for vectors that reduces memory at modest accuracy cost.

MRL. Matryoshka Representation Learning. Embedding technique that produces nested embeddings of multiple sizes from one model.

03 — Family 03Training & alignment acronyms.

How models are trained and aligned to follow instructions. Most of these are variants of the same idea — preference optimization — with different mathematical formulations.

RLHF. Reinforcement Learning from Human Feedback. Christiano et al. (2017); Ouyang et al. (2022). The classic alignment method behind ChatGPT.

RLAIF. Reinforcement Learning from AI Feedback. Variant where preferences are rated by an LLM judge instead of humans. Lee et al. (2023).

DPO. Direct Preference Optimization. Rafailov et al. (2023). Replaces reward modeling with a closed-form optimization. Now the most common alignment method.

ORPO. Odds Ratio Preference Optimization. Hong et al. (2024). Combines SFT and preference optimization in one stage; avoids reference-model overhead.

KTO. Kahneman-Tversky Optimization. Ethayarajh et al. (2024). Single-output preference learning (good/bad) without paired data.

IPO. Identity Preference Optimization. Azar et al. (2023). DPO variant that addresses overfitting on preference data.

SFT. Supervised Fine-Tuning. The standard first stage of post-training; teaches the model to follow instruction format.

PPO. Proximal Policy Optimization. Schulman et al. (2017). The reinforcement learning algorithm classically paired with RLHF.

GRPO. Group Relative Policy Optimization. DeepSeek (2024). Replaces critic models in PPO with within-group rewards. Used in DeepSeek's reasoning models.

LoRA. Low-Rank Adaptation. Hu et al. (2021). Parameter-efficient fine-tuning method; fine-tunes a low-rank matrix instead of full weights.

QLoRA. Quantized LoRA. Dettmers et al. (2023). LoRA on a quantized base model; massively reduces memory.

PEFT. Parameter-Efficient Fine-Tuning. Umbrella term for LoRA, prefix tuning, and similar adapter-based methods.

CPT. Continued Pre-Training. Extending the pre-training phase on domain-specific data before SFT.

"RLHF, DPO, ORPO, KTO — these are different math, same goal: make models prefer outputs humans prefer. Pick by training-stack convenience, not capability."— Internal alignment-method retro, March 2026

04 — Family 04Architecture acronyms.

How models are built. These mostly affect cost, throughput, and inference characteristics — not capability ceilings.

LLM. Large Language Model. The umbrella term; describes any decoder-style transformer trained on language.

SLM. Small Language Model. Sub-10B-parameter models tuned for specific tasks. Hot category for on-device and cost-sensitive deployment.

MoE. Mixture of Experts. Architecture pattern where specialized expert sub-networks are routed per token. Used in GPT-4, Mixtral, DeepSeek, Qwen.

MoA. Mixture of Agents. Architecture where multiple model instances collaborate at inference time. Wang et al. (2024).

MQA. Multi-Query Attention. Shazeer (2019). Reduces KV-cache memory by sharing key/value heads.

GQA. Grouped-Query Attention. Ainslie et al. (2023). Middle ground between MHA and MQA; standard in modern decoder LLMs.

MHA. Multi-Head Attention. Vaswani et al. (2017). The original attention pattern; full memory cost.

MLA. Multi-Latent Attention. DeepSeek innovation — projects KV into a latent space. Reduces memory without quality cost.

RoPE. Rotary Position Embedding. Su et al. (2021). The dominant positional encoding in modern LLMs.

ALiBi. Attention with Linear Biases. Press et al. (2021). Positional encoding alternative; better extrapolation to longer contexts.

MLP. Multi-Layer Perceptron. The feed-forward block in transformers.

FFN. Feed-Forward Network. Synonymous with MLP in transformer context.

05 — Family 05Reasoning & prompting acronyms.

How models reason at inference time. The vocabulary expanded fast in 2024-2026 as reasoning became a first-class capability.

CoT. Chain-of-Thought. Wei et al. (2022). The prompt pattern where the model produces a reasoning trace before the answer.

ToT. Tree-of-Thoughts. Yao et al. (2023). Reasoning that explores multiple branches; trades tokens for quality.

GoT. Graph-of-Thoughts. Besta et al. (2023). Reasoning structured as a DAG; reuses sub-results across branches.

SC. Self-Consistency. Wang et al. (2022). Sample N CoT traces; take majority answer.

ICL. In-Context Learning. The model learns new patterns from examples in the prompt without weight updates. Brown et al. (2020).

FSL. Few-Shot Learning. Variant of ICL with a small number of examples (typically 1-10).

ZSL. Zero-Shot Learning. ICL with no examples — purely instruction-following.

FIM. Fill-in-the-Middle. Bavarian et al. (2022). Prompting pattern for code completion that gives the model both prefix and suffix context.

PAL. Program-Aided Language. Gao et al. (2022). Reasoning by emitting code, then executing it for answers.

POT. Program-of-Thoughts. Chen et al. (2022). Variant of CoT where intermediate steps are code.

BoN. Best-of-N. Sample N candidate answers; select the best by judge or rule. Standard inference-time technique.

Reasoning vocabulary settled around runtime params

In 2026, the practical reasoning vocabulary collapsed to two knobs: CoT (the prompt pattern) and reasoning effort (the runtime parameter). The other acronyms describe variants worth knowing about but rarely surfaced in production discussions.

06 — Family 06Agents & tool use acronyms.

How agents discover, call, and coordinate tools. MCP-related terms dominate this family in 2026.

MCP. Model Context Protocol. Anthropic (2024). The cross-vendor standard for agent tool use.

A2A. Agent-to-Agent Protocol. Standard for inter-agent messaging. Google announced A2A in 2024 as a sibling to MCP for agent-to-agent communication.

ACP. Agent Communication Protocol. Older multi-agent literature term; sometimes used interchangeably with A2A.

ReAct. Reasoning + Acting. Yao et al. (2022). The canonical agent loop pattern.

CoA. Chain of Agents. Multi-agent coordination pattern with sequential handoff.

HITL. Human in the Loop. The user reviews each agent decision before execution.

HOTL. Human on the Loop. The user monitors but does not gate every step.

ASR. Agent Success Rate. The fraction of agent runs that complete the goal end-to-end.

TCA. Tool-Call Accuracy. Fraction of tool calls that produce a valid on-policy result.

CUA. Computer-Using Agent. OpenAI's name for agents that operate a computer interface (mouse, keyboard, screen).

BUA. Browser-Using Agent. Sub-class of agents that operate a browser specifically (vs full computer).

GUI. Graphical User Interface. The surface CUAs and BUAs operate against.

07 — Family 07Inference & runtime acronyms.

How models actually run at inference time. These acronyms govern cost, latency, and throughput characteristics.

TTFT. Time to First Token. The delay between request and first output token. Latency-budget critical.

TPS / TPM. Tokens per Second / Tokens per Minute. Throughput measurements.

KV cache. Key-Value cache. Per-token attention state held in memory during decode. The performance-critical structure in LLM inference.

vLLM. Very-fast LLM serving. Open-source inference engine with PagedAttention; widely used for self-hosting.

TGI. Text Generation Inference. Hugging Face's inference server.

SGLang. Structured Generation Language; also a high-performance inference engine.

FP16, BF16, FP8, INT8, INT4. Numeric precisions used for inference. Lower precision = lower memory and compute, with accuracy-cost trade-off.

GPTQ, AWQ. Post-training quantization algorithms. Standard methods for compressing trained models for efficient inference.

SP / TP / PP. Sequence Parallelism / Tensor Parallelism / Pipeline Parallelism. Strategies for splitting inference across multiple GPUs.

SD. Speculative Decoding. Inference-time technique using a smaller draft model to predict tokens verified by the main model.

CFG. Classifier-Free Guidance. Inference-time technique used in image and conditional text generation.

RoPE scaling. Methods to extend a model's context window past its trained length (PI, NTK, YaRN).

Cost lever

Precision · parallelism · caching

3axes

Quantization (FP16→FP8→INT4), parallelism (TP, PP), and KV-cache reuse cover ~90% of inference-cost optimization.

Optimization

Latency

TTFT + TPS

2metrics

TTFT for first-token responsiveness; TPS for steady-state throughput. Track both; they trade off differently per workload.

Measurement

Speed

Speculative decoding

1trick

2-3× throughput on most workloads with no quality cost. Standard in modern serving stacks.

08 — Family 08Ops & deployment acronyms.

How AI systems are deployed, monitored, and managed. The ops vocabulary borrows from MLOps and DevOps but adapts for LLM-specific concerns.

LLMOps. Operations practice for LLM-powered systems. Includes model versioning, prompt management, evaluation pipelines, and cost monitoring.

MLOps. Operations practice for traditional ML systems. Predecessor to LLMOps; methods overlap but artifacts differ.

AIOps. AI for IT Operations. Distinct from LLMOps — AIOps applies AI to managing IT systems; LLMOps is the operations of AI systems themselves.

RAGOps. Operations practice for RAG systems. Includes index versioning, freshness monitoring, retrieval-quality evaluation.

SLO. Service Level Objective. The target reliability level for an AI service.

SLA. Service Level Agreement. Contractual reliability commitment, often paired with SLOs.

SLI. Service Level Indicator. The measurement that informs whether SLOs are met.

RAI. Responsible AI. Umbrella term for governance, fairness, transparency, and safety practices.

AI RMF. AI Risk Management Framework. NIST's framework for governing AI risk.

EU AI Act. European Union AI Act. The first broad AI regulation, in effect 2024-2026 with progressive obligations.

ISO 42001. ISO/IEC 42001:2023 — the international standard for AI Management Systems.

GPAI. General-Purpose AI. EU AI Act category for foundation models.

"Cleaning up acronym drift in our internal docs cut new-hire ramp time by ~30%. The mistake was thinking everyone already knew what RAG meant."— Internal docs migration retro, May 2026

09 — ConclusionThe acronyms multiply; the underlying primitives don't.

The shape of AI marketing acronyms · April 2026

The 250+ collapse to ~30 underlying ideas. Build the decode list once.

AI marketing acronyms multiply because every paper, vendor, and analyst coins their own. The good news: most acronyms are minor variants of canonical ones. RLHF, DPO, ORPO, KTO — same goal, different math. CoT, ToT, GoT, SC — same idea scaled to different problem shapes. Knowing the parent term is enough to navigate most conversations.

The 250+ acronyms in this list collapse to roughly 30 underlying ideas. Build a decode list once — internal, external, or cribbed from this reference — and update it quarterly as new acronyms surface.

The compounding return is in cross-functional reviews. When engineering, marketing, finance, and legal all reference the same acronym list, contracts get unambiguous and conversations stay grounded. The cost is one canonical doc; the upside is steady alignment across every conversation.

AI Marketing Acronyms Master List 2026.

01 — Family 01Search & visibility acronyms.

GEO

AEO

AIO

SEO

02 — Family 02Retrieval & RAG acronyms.

03 — Family 03Training & alignment acronyms.

04 — Family 04Architecture acronyms.

05 — Family 05Reasoning & prompting acronyms.

06 — Family 06Agents & tool use acronyms.

07 — Family 07Inference & runtime acronyms.

Precision · parallelism · caching

TTFT + TPS

Speculative decoding

08 — Family 08Ops & deployment acronyms.

09 — ConclusionThe acronyms multiply; the underlying primitives don't.

The 250+ collapse to ~30 underlying ideas. Build the decode list once.

Stop pausing every meeting to decode acronyms.

Acronym alignment engagements

The acronym questions we get every week.

Continue exploring AI vocabulary references.

Build a GEO Visibility Agent With MCP: 2026 Playbook

AI Compliance & Governance Glossary 2026: 100 Terms

Token Economics Vocabulary: The LLM Cost Glossary

AI Content Quality Rubric: 12-Point Scoring System