Most production prompt engineering reduces to 50 reusable patterns. The rest is composition, model-specific tuning, and the application-layer integration that surrounds the prompt.
This library organizes the 50 patterns by task type: extraction (10), transformation (10), classification (10), reasoning (10), and agent loops (10). Each entry has a template, a worked example, model-specific notes for Claude Opus 4.7, GPT-5.5 Pro, Gemini 2.5, and open-weight models, and the failure modes we have seen most often in production.
Treat this as a starting point — every pattern needs to be tuned to your evaluation set. The library shortens the distance from "I need to extract data from this document" to "here's a template that works on this kind of document."
- 01Schema-first extraction beats free-form parsing every time. Use JSON Schema; let the runtime validate.On Claude Opus 4.7, GPT-5.5 Pro, and Gemini 2.5, every frontier model now respects JSON Schema reliably. Don't hand-roll output parsers; declare the schema.
- 02Reasoning patterns split into structural (CoT, ToT) and mechanical (verifier loops). Mix carefully.Adding a verifier loop on top of CoT is fine. Combining ToT with explicit verifiers usually overshoots — pick one structural reasoning pattern and pair it with one verification pattern.
- 03Agent-loop patterns demand explicit halt conditions. Don't trust the model to know when it's done.Even on the strongest models, ~5-10% of agent runs continue past the user goal without an explicit halt instruction. Build a structured 'final_answer' tool call as the only termination path.
- 04Few-shot examples are the highest-leverage 5 minutes you can spend on a prompt.Two or three high-quality examples often beat 10 lines of instruction. Especially for classification and extraction tasks. Always include at least one negative example.
- 05Model-specific tuning matters less than it used to. Frontier models converged on most patterns by Q2 2026.For most patterns, the same prompt now works across Claude, GPT, and Gemini. The remaining differences are at the margins (refusal behavior, citation style, JSON-mode strictness).
01 — Category 01Extraction patterns.
Pulling structured data out of unstructured text. Ten patterns cover most production extraction work.
Pattern 1: Schema-first extraction. Declare a JSON Schema; instruct the model to return data matching the schema. Use the runtime's structured output mode (Anthropic tool calling, OpenAI response_format, Gemini JSON mode). Highest reliability for repeatable extraction.
Pattern 2: Table-cast extraction. Ask the model to cast unstructured content into a tabular shape with explicit column headers. Right when extraction targets are comparable across rows.
Pattern 3: Citation-anchor extraction. For each extracted value, require a span citation back to the source. Used when downstream consumers need to verify provenance.
Pattern 4: Multi-document merge. Extract from several documents, then merge into a single object with per-field source attribution. Used in due-diligence and research workflows.
Pattern 5: Hierarchical extraction. Extract into nested structures matching the source document's structure (sections, subsections). Used for legal contracts and structured reports.
Pattern 6: Conditional-field extraction. Different sub-schemas based on document type. The model first classifies type, then uses the matching schema. Reduces null fields in mixed corpora.
Pattern 7: Long-document chunked extraction. Chunk the input, extract per chunk, then merge. Right when documents exceed context window or when extraction is section-local.
Pattern 8: PII-aware extraction. Extraction schema includes flags for PII fields; the runtime applies redaction or hashing before downstream use.
Pattern 9: Confidence-scored extraction. Each field has an associated confidence score; downstream gates handle low-confidence extractions differently (route for human review, retry with stronger model).
Pattern 10: Self-verifying extraction. After extraction, a second pass checks the structured output against the source for completeness and accuracy. Standard for high-stakes extraction.
Schema-first
JSON Schema · structured output modeUse the runtime's structured-output feature. Lowest implementation cost; highest reliability.
Pattern 1Citation-anchor
field + span citationEach extracted value linked to source span. Required for legal, financial, regulatory.
Pattern 3Chunked extraction
extract per chunk · mergeSection-aware chunking; merge in post-processing. Default for documents over 100K tokens.
Pattern 7Conditional-field
type detect · sub-schemaClassify document type first, then use matching schema. Reduces null fields in heterogeneous inputs.
Pattern 602 — Category 02Transformation patterns.
Rewriting, restyling, restructuring text. Ten patterns for content workflows.
Pattern 11: Rewrite-with-constraints. Rewrite preserving meaning under explicit constraints — length, tone, audience, style. Standard for content adaptation.
Pattern 12: Voice-port. Rewrite content in another voice (executive, casual, marketing-tight) defined via reference examples. The most common content-team use case.
Pattern 13: Format conversion. Convert between document formats (markdown to HTML, prose to structured outline, table to narrative).
Pattern 14: Translation with glossary. Translate preserving organization-specific terminology. Glossary attached as constraint; translation must use glossary mappings.
Pattern 15: Redact-and-restore. Replace sensitive content with placeholders, perform transformation, restore. Used to preserve PII while letting the model work on content.
Pattern 16: Persona-driven rewrite. Rewrite from a specific persona's perspective (legal counsel, finance VP, customer-success lead). Used for stakeholder-specific communication.
Pattern 17: Length-targeted summarization. Summarize to a specific length budget (50 words, 200 words, single tweet). Critical for executive communication.
Pattern 18: Hierarchical summarization. Multi-level summary (one-line, paragraph, page) generated in one call. Lets readers drill in by section.
Pattern 19: Style-transfer with reference. Three reference examples in the target style; one input paragraph; output in matched style. Highest quality voice transfer.
Pattern 20: Defensive rewrite. Rewrite output to remove specific phrases (legal flagged terms, competitor names, deprecated brand terms) while preserving meaning.
03 — Category 03Classification patterns.
Categorizing inputs into discrete labels. Ten patterns from simple binary to hierarchical multi-label.
Pattern 21: Label-with-rationale. Output the label plus a one-sentence rationale. Critical for auditing and downstream review; rationale catches misclassification before the label propagates.
Pattern 22: Hierarchical classification. Multi-level taxonomy; classify at level 1, then level 2 within the chosen level-1 branch. Used for large taxonomies.
Pattern 23: Multi-label classification. Output a list of applicable labels with confidence per label. Used for content tagging and faceted search.
Pattern 24: Confidence-band classification. Classify into "high," "medium," "low" confidence bands with explicit threshold rules. Routes low-confidence cases for human review.
Pattern 25: Few-shot classification. Two or three examples per label class in the prompt. Highest quality on small or imbalanced taxonomies.
Pattern 26: Negative-example classification. Include "what isn't this label" examples in the prompt. Reduces over-application of dominant labels.
Pattern 27: Self-consistent classification. Run N classifications; take majority vote. Improves reliability on edge cases at N× cost.
Pattern 28: Routing classification. Classify input to determine which downstream model or workflow handles it. The hub-and-spoke pattern in agent routing.
Pattern 29: Sentiment-with-aspect. Classify sentiment per identified aspect (product, service, pricing). Standard for review analysis.
Pattern 30: Toxicity / safety classification. Multi-axis safety classifier (toxicity, PII presence, jailbreak attempt). Standard pre-filter for user-generated input.
"Always require a one-sentence rationale on classification outputs. The rationale is where the model surfaces uncertainty and where humans catch errors."— Internal classification reliability retro, May 2026
04 — Category 04Reasoning patterns.
Multi-step thinking before answering. Ten patterns for reasoning-heavy tasks.
Pattern 31: Chain-of-thought (CoT). Wei et al. (2022). Instruct the model to "think step by step" before answering. Baseline for any reasoning-heavy task.
Pattern 32: Plan-then-act. Generate an ordered plan first; then execute the plan step-by-step. Used for problems with multiple sub-steps.
Pattern 33: Self-consistency. Wang et al. (2022). Sample N CoT traces; take majority vote. Robust lift on arithmetic and multi-step tasks at N× cost.
Pattern 34: Verifier-loop. Generate answer; second pass verifies correctness or policy compliance; regenerate on failure. Standard for high-stakes outputs.
Pattern 35: Debate. Two model instances argue opposing positions; a third synthesizes. Used for decisions where multiple perspectives matter.
Pattern 36: Critique-and-revise. Generate draft; second pass critiques specifically; revise based on critique. Substantially improves writing quality.
Pattern 37: Decompose-and-conquer. Break complex problem into sub-problems; solve each; combine. Used for analytical and research tasks.
Pattern 38: Reasoning effort dial. Use the runtime's reasoning-effort parameter (Claude extended thinking, OpenAI reasoning_effort). Trades tokens for quality on reasoning-heavy tasks.
Pattern 39: Hypothesis-test. Generate hypothesis; design test; reason about test result. Used for scientific reasoning and analytical research.
Pattern 40: Tree-of-thoughts (ToT). Yao et al. (2023). Explore multiple reasoning branches; backtrack; select most promising. High cost; right for complex search problems.
05 — Category 05Agent loop patterns.
How the model orchestrates tool calls, plans, and multi-step execution. Ten patterns for agent workflows.
Pattern 41: ReAct loop. Yao et al. (2022). Alternate "thought" and "action" steps; observe each tool result; decide next. Default for general agents.
Pattern 42: Plan-execute split. One model (or call) generates plan; a different model executes step-by-step. Cheaper at scale; brittle when plans need adaptation.
Pattern 43: Reflexion loop. Shinn et al. (2023). Add explicit self-critique after each step; append critique to context for next iteration. Reduces repeated failure modes.
Pattern 44: Re-anchor checkpoint. At fixed step count, summarize history into compact state; restart loop with summary as new prefix. Prevents prefix-cache invalidation in long agents.
Pattern 45: Halt-on-final-answer. Designate a structured "final_answer" tool; loop terminates when called. The only termination path; budget exhaustion is treated as failure.
Pattern 46: Approval gate. Designate destructive tools as gated; runtime requires user confirmation before executing. Standard governance pattern.
Pattern 47: Verifier-critic. One model generates; a separate critic model checks against rubric; generator revises on critique. Used for high-quality outputs in agent contexts.
Pattern 48: Supervisor-worker. Supervisor model decomposes task; worker sub-agents execute in parallel; supervisor aggregates. Hierarchical agent topology.
Pattern 49: Tool-gating. Filter the visible tool catalog per step based on task context. Reduces tool selection error and context cost.
Pattern 50: Bounded retry-with-backoff. Tool errors trigger retry with exponential backoff; transient errors recover; permanent errors surface for plan revision.
ReAct + halt + retry
Combine patterns 41, 45, 50. Covers ~80% of agent workflows. Add other patterns based on evaluation gaps.
Production defaultAdd 43 + 46 + 47
Reflexion (cuts repeat failures), approval gates (governance), verifier-critic (quality). Use for irreversible-action agents.
GovernanceRe-anchor at 40 steps
Pattern 44. Prevents cache invalidation past 50 steps; built into all agents we deploy with budgets >30 steps.
Reliability06 — Category 06Model-specific notes.
Where patterns work the same across frontier models, and where they diverge. The differences matter at production scale even though they're small.
Claude Opus 4.7. Strong on long-context reasoning, citation-anchor extraction, and verifier-loops. Extended thinking budget is the leverage; use it on reasoning-heavy tasks. Native tool calling is reliable; structured outputs work well.
GPT-5.5 Pro. Strongest on agentic patterns (Patterns 41-50). reasoning_effort dial provides cleanly tuned cost-quality control. Slightly more conservative on refusal; test classification thresholds carefully.
Gemini 2.5 Pro. Strong on multimodal patterns and long-context (2M tokens). JSON mode strict enforcement is the cleanest of the three; use it for any extraction work. Slightly weaker on adversarial robustness.
DeepSeek V4. Cost-leader for high-volume workloads. Open weights for self-hosting. Strong on coding patterns; weaker on instruction-following nuance compared to commercial frontier.
Llama 3.5 / 4.0. Open-weight workhorse. Use for self-hosted extraction and classification at low cost. Reasoning patterns work; agent patterns require more careful prompt design.
Qwen 3.6 Max. Strong multilingual performance. Use for multi-language extraction or content generation in non-English markets.
"Frontier models converged on most patterns by Q2 2026. The remaining differences are at the margins — refusal behavior, citation style, JSON-mode strictness."— Internal model-routing retro, April 2026
07 — ConclusionPatterns stabilize while models evolve.
50 patterns is roughly the right scope for a working library; build yours from these.
Prompt engineering as a discipline matures by accumulating reusable patterns. The 50 here cover the bulk of production work across extraction, transformation, classification, reasoning, and agent loops. Most teams need fewer than 20 in active use; the rest are reference for special cases.
The pattern library shortens the distance from "I need to do X" to "here's a tested template I can adapt." Tune to your evaluation set, lock the production version in your prompt registry, and version it like any other code artifact.
Models will continue to evolve; patterns hold up much longer than specific prompt strings. The 50 in this library have survived three frontier-model release cycles; the specific prompts inside them get tuned per release. Build the library; rebuild the prompts.