AI developer hiring in 2026 is a different problem than it was eighteen months ago. The “LangChain + Pinecone resume” that signaled AI readiness in 2024 is now table stakes — and increasingly, a yellow flag. What hiring managers need to test now is whether a candidate can actually ship agent systems, manage inference cost at production scale, and verify AI output rather than trust it. This guide gives you the toolset: 10 must-have skills with seniority expectations, a 50-question bank with rubrics, 12 resume red flags, and salary bands anchored to named sources retrieved 2026-05-24.

The urgency is real and the data is unambiguous. According to the Lightcast Global AI Skills Outlook, AI-skill job postings jumped 73% from 2023 to 2024 and another 109% from 2024 to 2025. By 2026, approximately 2.5% of all US job postings mention AI skills — up 55% year-over-year and roughly 300% over the past decade. Meanwhile, the Stack Overflow Developer Survey 2025 found that only 29% of respondents trust AI output — down 11 percentage points from 2024. The hiring implication: the market is in a trust-gap moment. Candidates who can build, evaluate, and constrain AI systems are worth dramatically more than candidates who can merely prompt them.

This guide covers what the 2026 developer-survey adoption data actually implies for hiring criteria, the 10 skills that have replaced the old framework-fluency checklist, the complete 50-question bank across those skills, compensation bands by level and city (anchored to Kore1, Levels.fyi, and PwC), a fabrication- literacy screening method no other hiring guide covers, and a hiring-manager action plan for each seniority tier.

Key takeaways

01
The trust gap is the real hiring signal.84% of developers use or plan to use AI tools, per Stack Overflow 2025 — but only 29% trust AI output, down 11 points from 2024. This gap is your hiring screen. Candidates who demonstrate eval rigor, failure-mode intuition, and the discipline to verify AI output over the next year will be significantly more valuable than candidates who are simply enthusiastic about AI.
02
10 skills have replaced the LangChain + Pinecone resume.Agent orchestration, MCP integration, eval design, prompt engineering, vector DB/RAG, cost optimization, safety/guardrails, computer-use deployment, production observability, and frontier-model fluency are the 2026 hiring checklist. Candidates who list only LangChain, Pinecone, and ChatGPT API without these newer capabilities are likely in a 2024 mindset.
03
Eval design is the single best signal of real LLM experience.Per The AI Career Lab's 2026 agentic guide: 'Eval literacy — knowing how to design, run, and reason about model evaluations — is the single biggest signal of "this person actually built with LLMs" vs watching YouTube videos.' Ask every candidate, at every level, to walk you through an evaluation they designed. The answer quality is the signal.
04
Frontier-lab compensation creates a wage floor for all AI roles.Anthropic Software Engineer median: $600K. OpenAI L5: $1.15M total comp (per Levels.fyi via Pin.com, retrieved 2026-05-24). These figures set a market anchor that compresses the full salary band upward. AI engineer national range: $145K–$310K with San Francisco/Bay Area total comp at $270K–$390K+ per Kore1's 2026 guide. Budget accordingly before opening headcount.
05
Fabrication literacy is a free 30-second senior filter.Ask a senior candidate to name the current Anthropic Claude generation and its pricing. If they say 'Sonnet 5' — that model does not exist and is catalogued as a fabrication in Anthropic's primary documentation. If they claim MCP servers are configured in settings.json — that is a fabricated config path. Candidates who have not read primary documentation in the past 90 days may not be tracking a field that moves weekly.

01 — 2026 AI Hiring Market109% posting growth, a 56% wage premium, and a trust gap that defines the screen.

The macro picture for AI developer hiring is unusual: demand is accelerating while the supply of candidates who can actually demonstrate production readiness is not keeping pace. According to the Lightcast 2026 predictions data, AI skills now appear in 2.5% of all US job postings — up 55% year-over-year, 72% from 2022, and approximately 300% over the past decade. Job postings requiring AI skills grew 73% from 2023 to 2024, then accelerated to 109% from 2024 to 2025. These are not marginal upticks; they represent a structural shift in what employers consider baseline engineering capability.

On the wage side, the picture depends on which source you use — and both are legitimate. A Lightcast press release from July 2025 found AI skills command a 28% salary premium. The PwC 2025 Global AI Jobs Barometer, as cited in Kore1's 2026 salary guide, puts the number higher at 56% — up from 25% the prior year. The methodological difference matters: Lightcast measures the premium across industries; PwC focuses on like-for-like roles with and without AI skills. Both signal a real premium; the honest range is 28–56%.

The demand-supply gap is visible in the Stack Overflow data too. According to the Stack Overflow Blog's December 2025 survey summary, 51% of professional developers use AI tools daily — but 52% either don't use AI agents at all or stick to simpler AI tools, and 38% have no plans to adopt them. This means production agent experience remains genuinely differentiated. The candidate who can name every OWASP LLM Top 10 category, design a golden-dataset eval, and walk through the cost model of a 10-turn Sonnet 4.6 loop is not a commodity — yet.

Job postings

YoY growth 2024 → 2025

+109%

AI-skill job postings accelerated from +73% (2023→2024) to +109% (2024→2025). Now 2.5% of all US postings mention AI skills, up 55% from 2024. Source: Lightcast Global AI Skills Outlook.

Lightcast, retrieved 2026-05-24

Wage premium

AI vs non-AI same-role peers

56%

PwC 2025 Global AI Jobs Barometer: workers with AI skills earn 56% more than peers in the same roles without AI skills — up from 25% one year prior. Lightcast pegs the cross-industry figure at 28%. Honest range: 28–56%.

PwC 2025 / Lightcast Jul 2025

Daily AI use

Pro devs using AI tools daily

51%

Stack Overflow Developer Survey 2025: 51% of professional developers use AI tools daily. Yet only 29% trust AI output — down 11 percentage points from 2024. Trust gap = the hiring screen.

Stack Overflow Dec 2025

Agent gap

Devs not yet using AI agents

52%

52% of developers either don't use agents or stick to simpler tools. 38% have no plans to adopt them. Production agent experience remains differentiated — not commodity — in 2026.

Stack Overflow Dec 2025

02 — Must-Have SkillsThe 10 skills that replaced the LangChain + Pinecone resume.

The canonical “AI developer” resume of 2024 listed LangChain, Pinecone, a ChatGPT API wrapper project, and something about Hugging Face. That stack is not wrong — but it is no longer sufficient as a signal of production readiness in 2026. The field has moved to agentic systems, multi-model cost management, and structured evaluation. The ten skills below reflect what production-AI teams actually need from their engineers in 2026, drawn from the AI Career Lab's 2026 agentic jobs guide, the OWASP LLM Application Top 10, and the MCP specification at 2025-11-25.

According to Kore1's 2026 salary guide, over 75% of AI postings now seek focused experts rather than generalists — so the skill list below is not an exhaustive everything-menu. It is the production-readiness screen. Each skill has a differentiated salary impact and a corresponding interview question set in the section that follows.

Agent orchestration

Supervisor patterns, tool-calling loops, failure recovery

The core agentic skill: designing multi-agent systems with supervisor and worker patterns, handling sub-agent timeouts, managing state across tool calls, and knowing when NOT to use an agent. Mid+ candidates should name LangGraph or CrewAI; senior+ should explain failure modes in production. See our LangGraph vs CrewAI orchestration framework comparison for the deep-dive. Salary impact: +$20K–$40K at senior, +$40K–$80K at staff.

Senior screen: failure modes

MCP integration

Model Context Protocol — spec 2025-11-25

MCP is the de facto standard for connecting AI agents to external tools, per 97M monthly SDK downloads as of February 2026. Mid candidates should know the spec version (2025-11-25) and supported transports (stdio + Streamable HTTP — HTTP+SSE was deprecated March 2025). Senior+ should explain why HTTP+SSE was deprecated, where MCP config lives in Claude Code (.mcp.json) vs Codex CLI ([mcp_servers.NAME] in ~/.codex/config.toml), and OAuth 2.1 + PKCE for remote servers. This skill effectively screens for 'reads primary docs daily.'

Spec-level fluency required

Eval design

Golden datasets, LLM-as-judge, online + offline evals

The AI Career Lab 2026 guide calls eval design 'the single biggest signal of this person actually built with LLMs vs watching YouTube videos.' Ask every candidate, at every level, to walk through an evaluation they designed: what was the metric, what was the dataset, how was it kept fresh, and what did it catch? Candidates who can't answer this have never shipped to production under any performance bar. Pairs directly with our AI evaluation metrics reference guide.

Universal screen — all levels

Prompt engineering

System prompts, caching, few-shot calibration

No longer a standalone skill but a prerequisite. Mid candidates should know how Anthropic prompt caching works (cache write vs cache read pricing, breakeven on 1K+ tokens) and when few-shot examples hurt rather than help. Senior+ should design tool descriptions that minimize hallucination on sensitive data. Pairs with advanced prompt engineering techniques in 2026.

Prerequisite — screen for depth

Vector DB / RAG

pgvector, hybrid search, recall engineering

Junior candidates should know embedding dimensions. OpenAI text-embedding-3-large outputs 3072 dimensions natively. The common wrong answer cites the Matryoshka shortening sizes (1024 / 512) as if they were the default. Mid should choose between pgvector, Pinecone, and Qdrant by use case. Senior should explain hybrid search (BM25 + vector), HNSW vs IVFFlat indexing tradeoffs, and diagnose a retrieval system with 60% recall. Salary add: RAG architecture +10–15% at mid-level (Kore1 2026).

Junior screen: 3072 dim check

Cost optimization

Model routing, prompt caching, per-task P&L

Per the AI Career Lab: 'cost modeling is underrated in interviews but massively over-indexed on once in the role.' Ask mid candidates to estimate the per-conversation cost of a 10-turn Sonnet 4.6 loop with 8K input / 1K output per turn. Staff candidates should design a model router (Opus → Sonnet → Haiku) by query complexity and describe the prompt-caching cost model. Candidates who have never thought about inference cost have never shipped under a budget.

Cost question catches 'lab-only'

Safety / guardrails

OWASP LLM Top 10, LLM06 Excessive Agency

Every mid+ candidate should name three OWASP Top 10 for LLM Applications 2025 categories without looking them up. Senior candidates should explain LLM06 Excessive Agency — the agent-specific failure mode where a system takes irreversible actions beyond its intended scope — and propose mitigations (least-privilege tool design, human-in-the-loop checkpoints, kill switches). The EU AI Act high-risk obligations take effect August 2, 2026; candidates who claim 'we already comply' are wrong — that is also a useful screen.

Senior screen: LLM06 depth

Computer-use deployment

Vision-action loops, sandboxing, kill switches

Mid candidates should explain the difference between a vision-action loop and a tool-calling loop. Senior candidates should describe the safety-net model for a computer-use deployment — rate limits, action auditing, rollback triggers. Staff candidates should design a kill switch: what signals trigger it, what constitutes a safe rollback, and how you prevent accidental irreversible actions (closing the wrong email thread, deleting a record). Microsoft Copilot Studio computer-use agents reached general availability in May 2026 as a useful current-events anchor.

Staff screen: kill-switch design

Production observability

LLM tracing, eval alerting, trace economics

Mid candidates should name three observability platforms (LangSmith, Braintrust, Helicone, Phoenix/Arize, Langfuse) and explain the difference between a trace and a span. Senior candidates should describe how to alert on degraded eval scores in production. Staff candidates should design a regression-detection pipeline for a model that redeploys weekly and walk through the cost-per-trace economics of full-fidelity tracing at enterprise scale.

Names platforms without prompting

Frontier-model fluency

Weekly changelog as daily habit, not Google search

Ask a junior candidate to name the current Anthropic Claude generation and its pricing. Correct: Opus 4.7 (April 16, 2026, $5/$25 per Mtok, 1M context), Sonnet 4.6 (February 17, 2026, $3/$15). Wrong: 'Sonnet 5' — that model does not exist. A candidate who reads Anthropic's news page weekly knows this without prompting. Staff candidates should walk through an Opus 4.7 vs GPT-5.5 vs Gemini 3.1 Pro decision tree for a 100-step agent loop.

Sonnet 5 = automatic flag

03 — Interview Question Bank50 questions across 10 skills — with rubrics for good and junior-flagged answers.

The questions below are organized by skill. Each has a difficulty tag (Junior / Mid / Senior / Staff), an expected “good answer” framing, and a note on what a weak answer looks like. Use the five-per-skill structure to run a calibrated 45-minute interview: pick one Mid-level and one Senior-level question per skill, then follow up on gaps. See the engineering team playbook for agentic AI for the operating model context — what the candidate will be hired into matters as much as the questions asked.

The AI Career Lab — Agentic AI Jobs Guide 2026

“Eval literacy — knowing how to design, run, and reason about model evaluations — is the single biggest signal of ‘this person actually built with LLMs’ vs watching YouTube videos. Cost modeling (knowing what an agent loop costs at production scale) is underrated in interviews but massively over-indexed on once in the role.” — The AI Career Lab, Agentic AI Jobs Guide 2026, retrieved 2026-05-24.

Skill 1

Agent Orchestration — 5 questions

Mid · Mid · Senior · Staff · Mid

Q1 [Mid]: Design a multi-agent system that processes inbound support tickets. What's the supervisor pattern? How do you handle one sub-agent timing out while another is mid-tool-call? | Q2 [Mid]: What's the difference between a tool-calling agent and a router agent? | Q3 [Senior]: How would you make Claude Code subagents work for a task that needs three levels of hierarchy? (Trick: they're one level deep.) | Q4 [Staff]: Walk me through the failure modes in a LangGraph supervisor with eight sub-agents. Which fail first? How do you observe? | Q5 [Mid]: When would you NOT use an agent and instead use a deterministic pipeline?

LangGraph vs CrewAI deep-dive available

Skill 2

MCP Integration — 5 questions

Mid · Senior · Senior · Mid · Staff

Q1 [Mid]: What's the current MCP spec version, and what are the two supported transports? (Expected: 2025-11-25; stdio + Streamable HTTP.) | Q2 [Senior]: Why was HTTP+SSE deprecated, and when? (March 26, 2025.) | Q3 [Senior]: Where does MCP config live for Claude Code? For Codex CLI? (.mcp.json + /mcp manager; [mcp_servers.NAME] in ~/.codex/config.toml.) | Q4 [Mid]: Sampling, roots, and elicitation — are those server features or client features? (Client features — common confusion.) | Q5 [Staff]: Walk me through OAuth 2.1 + PKCE + RFC 9728 PRM for a remote MCP server. What does Protected Resource Metadata give you?

MCP at 97M monthly downloads

Skill 3

Eval Design — 5 questions

Mid · Mid · Senior · Senior · Staff

Q1 [Mid]: What's the difference between online and offline evals? | Q2 [Mid]: Walk me through a golden dataset. How large? How do you keep it fresh? | Q3 [Senior]: Pros and cons of LLM-as-judge? What are the cost and bias failure modes? | Q4 [Senior]: What's the difference between RAGAS faithfulness and answer_relevancy? When would you use each? | Q5 [Staff]: We're shipping a customer-facing agent. Design the eval suite from scratch — regression tests, online evals, alerting thresholds.

AI eval metrics reference guide available

Skill 4

Prompt Engineering — 5 questions

Junior · Mid · Senior · Senior · Staff

Q1 [Junior]: What's the difference between a system prompt and a user prompt? | Q2 [Mid]: Walk me through few-shot exemplars for a structured-output task — how many, what format, what order? | Q3 [Senior]: How does Anthropic prompt caching work? What's the breakeven point on cache cost? | Q4 [Senior]: When does adding more examples to a prompt make things worse? | Q5 [Staff]: Design a tool description that minimizes hallucination on a sensitive financial-data retrieval tool.

Advanced prompt engineering guide available

Skill 5

Vector DB / RAG — 5 questions

Junior · Mid · Senior · Senior · Staff

Q1 [Junior]: What's the native dimensionality of OpenAI's text-embedding-3-large? (Real answer: 3072. Common wrong answer: the Matryoshka shortening size, recited as if it were the default.) | Q2 [Mid]: When would you use pgvector vs Pinecone vs Qdrant? | Q3 [Senior]: Walk me through hybrid search (BM25 + vector). When does pure vector underperform? | Q4 [Senior]: What's the difference between HNSW and IVFFlat indexes in pgvector? Operational tradeoffs? | Q5 [Staff]: Our chatbot's recall is 60%. Diagnose. Where do you look first?

3072 dim = fabrication screen

Skill 6

Cost Optimization — 5 questions

Mid · Senior · Senior · Staff · Staff

Q1 [Mid]: Estimate the per-conversation cost of a 10-turn chat using Sonnet 4.6 with 8K input and 1K output per turn. | Q2 [Senior]: What's the difference between Anthropic's standard and Fast mode pricing? When would you pay 6×? | Q3 [Senior]: How would you build a model router that downgrades Opus → Sonnet → Haiku based on query complexity? | Q4 [Staff]: Walk me through a prompt-caching cost model. What's the breakeven point on cache reads vs full rerun? | Q5 [Staff]: Our agent loop costs $0.40 per task at 95% success. The PM wants $0.15 at 90%. Walk me through the tradeoff design.

Cost question catches lab-only candidates

Skill 7

Safety / Guardrails — 5 questions

Mid · Senior · Senior · Senior · Staff

Q1 [Mid]: Name three OWASP Top 10 for LLM Applications 2025 categories. | Q2 [Senior]: What is LLM06 Excessive Agency? Give a concrete example in an agent context. | Q3 [Senior]: How do you mitigate prompt injection in a customer-facing chatbot that retrieves emails? | Q4 [Senior]: What's the difference between input filtering and output filtering? When does each fail? | Q5 [Staff]: Design a jailbreak-resistance evaluation for our agent. What's the test corpus? What's the pass bar?

LLM06 = senior differentiator

Skill 8

Computer-Use Deployment — 5 questions

Mid · Senior · Senior · Staff · Staff

Q1 [Mid]: What's a vision-action loop, and how is it different from a tool-calling loop? | Q2 [Senior]: What's the rate-limit / safety-net model on a typical computer-use deployment? | Q3 [Senior]: Walk me through the OSWorld benchmark. What does it measure? | Q4 [Staff]: Design a kill switch for a computer-use agent. What signals trigger it? What's the rollback? | Q5 [Staff]: How would you sandbox a computer-use agent to prevent it from clicking 'send' on the wrong email thread?

Kill-switch design = staff screen

Skill 9

Production Observability — 5 questions

Mid · Senior · Senior · Staff · Staff

Q1 [Mid]: Name three production-grade LLM observability platforms. (Expected: LangSmith, Braintrust, Helicone, Phoenix/Arize, Langfuse.) | Q2 [Senior]: What's the difference between a trace and a span in LLM observability? | Q3 [Senior]: How do you alert on degraded eval scores in production? | Q4 [Staff]: Design a regression-detection pipeline for a fine-tuned model that redeploys weekly. | Q5 [Staff]: Walk me through the cost-per-trace economics of full-fidelity tracing for an enterprise agent.

Names platforms without prompting

Skill 10

Frontier-Model Fluency — 5 questions

Junior · Mid · Senior · Senior · Staff

Q1 [Junior]: Name the current Anthropic Claude generation and key pricing. (Expected: Opus 4.7 — April 16, 2026, $5/$25 per Mtok; Sonnet 4.6 — Feb 17, 2026, $3/$15. Wrong: 'Sonnet 5' — fabricated.) | Q2 [Mid]: What's the difference between GPT-5.5 standard pricing and the >272K input surcharge? | Q3 [Senior]: When would you pick Sonnet 4.6 over Gemini 3.1 Pro for a long-context coding workload? | Q4 [Senior]: Walk me through Anthropic's reliable knowledge cutoff vs training data cutoff distinction. | Q5 [Staff]: We're deciding Opus 4.7 vs GPT-5.5 vs Gemini 3.1 Pro for a 100-step agent loop. Walk me through your decision tree.

Sonnet 5 = hard pass at senior+

04 — Compensation DataAI engineer salary bands: level, city, and specialization (retrieved 2026-05-24).

Salary data in the AI engineering market moves faster than most annual guides capture — the figures below are sourced from Kore1, Levels.fyi, and Pin.com and retrieved 2026-05-24. Re-verify before quoting in offer letters. The national range for AI Engineers runs $145K–$310K per Kore1's real-offer dataset; the median total comp per Levels.fyi ML/AI Software Engineer page is $245K. The seniority premium curve, per Levels.fyi Q3 2025 analysis cited in Pin.com's AI compensation benchmarks, runs: entry +6.2%, engineer +11.9%, senior +14.2%, staff +18.7% premium versus non-AI peers at the same level.

Geographic spread is material: per Kore1, the difference between the highest-comp market (San Francisco/Bay Area) and the lowest common market (Austin) can be up to $110K in total comp. California and New York account for 43% of all AI/ML engineering postings per Axial Search's analysis of 10,000+ AI/ML postings; remote roles are just 13% of the total. Budget assumptions based on national median significantly understate the cost of competitive hiring in either coast metro.

For a related salary context from the marketing side of the same org, see our 2026 digital-marketing salary guide.

AI engineer total compensation by seniority level

Sources: Kore1 AI Engineer Salary Guide 2026 (kore1.com/ai-engineer-salary-guide) · Levels.fyi via Pin.com AI Compensation Benchmarks (pin.com/blog/ai-compensation-salary-guide) · Retrieved 2026-05-24

Entry level (0–2 yrs) — total compBase $90K–$135K · Source: Kore1 2026, retrieved 2026-05-24

$110K–$160K

Mid level (3–5 yrs) — total compBase $140K–$210K · Source: Kore1 2026, retrieved 2026-05-24

$170K–$260K

Senior (6–9 yrs) — total compBase $180K–$280K · Source: Kore1 2026, retrieved 2026-05-24

$220K–$350K+

Staff / Principal (10+ yrs) — total compBase $250K–$400K+ · Source: Kore1 2026, retrieved 2026-05-24

$350K–$600K+

Frontier lab (Anthropic / OpenAI) — medianAnthropic SW Eng $600K · OpenAI SW Eng $795K · Source: Levels.fyi via Pin.com, 2026-05-24

$600K–$1.15M+

Specialization adds measurable salary premium on top of the base band. Per Kore1's 2026 data: RAG architecture adds +10–15% at mid level; LLM fine-tuning (LoRA/QLoRA/RLHF) adds another +10–15%; MLOps and deployment capability adds $15K–$30K versus notebook-only candidates. Agentic AI Engineers command $185K–$320K base plus $40K–$120K equity at growth-stage companies; AI Agent Architects run $260K–$420K base plus equity, per the AI Career Lab's agentic jobs guide. LinkedIn's own AI Engineer band runs $305K–$454K+ per Levels.fyi data ($305K at IC2, $454K at IC3, median $325K).

05 — Screening Overlay12 resume red flags that signal lab experience over production work.

The flags below are sourced from the research team's survey of AI engineering job requirements and from patterns observed in agentic hiring pause data covered in our Q2 2026 labor survey. Each flag has a follow-up question that surfaces the truth — some candidates can explain their way past a flag, and that explanation tells you more than the resume ever would.

Flag 1

Claims 100% LLM accuracy

100%

Doesn't understand evals. Ask: 'How did you measure accuracy?' Any answer that doesn't include a test set, a metric definition, and a failure-mode discussion is a red flag at mid+ level.

Eval understanding missing

Flag 2

Frameworks-only fluency

LLMonly

LangChain on the resume, nothing else. Tutorial-stack candidate. Ask: 'What did you build without LangChain?' If they can't name a project that required understanding what LangChain was doing under the hood, they're framework-dependent.

Ask: 'What without LangChain?'

Flag 3

Lists 'Sonnet 5' in model stack

That model does not exist. It is catalogued as a fabrication in Anthropic's primary documentation. For senior+ candidates, this is a hard pass — they should be reading model announcements as they ship, not months later via tech-blog summaries.

Hard pass at senior+

Flag 4

'Implemented RAG' — no recall numbers

RAG

Toy project signal. Ask: 'What was your recall@10?' Any production RAG system has a recall metric. If they can't quote one, they implemented RAG but never measured whether it worked.

Ask: 'What was recall@10?'

Additional flags to screen at resume review: “Prompt engineer” with no eval rigor (all vibes, no science); MCP listed but can't name the spec version; claims to have “trained an LLM” (almost certainly fine-tuning conflated with training from scratch); “98% deflection rate” with no source or definition; “built an agent” with no failure-mode discussion; EU AI Act “we already comply with high-risk obligations” (those take effect August 2, 2026 — candidates claiming current compliance don't know the timeline); no mention of inference cost in any project (built in a lab, not under a P&L); and no knowledge of a single OWASP LLM Top 10 category.

The 55% of companies that later regret AI-driven cuts — covered in our analysis of that regret data — often rushed hiring and accepted weak signals. The red-flag screen above is a counter-pressure: it takes 10 minutes on a resume and can eliminate candidates who would have failed in the first 90 days.

06 — Fabrication LiteracyThe 30-second fabrication-literacy screen no other hiring guide covers.

The AI field moves fast enough that an engineer who has not read primary documentation in 90 days may be operating on a model stack that has already changed. The fabrication-literacy screen tests exactly this: does the candidate get their information from primary sources (vendor docs, spec pages, release announcements), or do they get it from tech-blog summaries and social posts that sometimes introduce errors?

The technique is simple. Ask a question that has a factually correct answer that is also slightly non-obvious — and where a common fabrication exists. The most efficient version for 2026 is the Claude model question. Ask any candidate: “Name the current Anthropic Claude generation and its pricing.” The correct answer is Opus 4.7 (released April 16, 2026, $5/$25 per million tokens, 1M context window) and Sonnet 4.6 (released February 17, 2026, $3/$15). A candidate who says “Sonnet 5” is citing a model that does not exist — it is a fabrication catalogued in Anthropic's own fact-checking resources. No Anthropic news page for Sonnet 5 exists.

For MCP-specific roles, the equivalent screen: “Where does MCP config live for Claude Code?” The correct answer is .mcp.json in the project root, managed via the /mcp manager command. The fabricated answer points to a user-config file with an mcpServers key — that pattern comes from outdated docs or hallucinated summaries of the spec. Similarly, “What's the native output dimensionality of OpenAI's text-embedding-3-large?” The correct answer is 3072. Candidates who quote a smaller number are usually recalling the Matryoshka shortening option (configured via the dimensions parameter), not the model card's default. A junior recital of the shortening value as if it were the native size is a primary-docs reading gap.

The fabrication screen is not a gotcha. It is a proxy for reading habits. Engineers who read primary docs as they ship stay calibrated in a field where “best practice” changes quarterly. Engineers who rely on secondary summaries accumulate fabrications — and those fabrications ship into production. For a deeper look at how this applies to building with LLMs, see our build a Claude skill from scratch tutorial — the kind of project a strong candidate would have shipped.

84% of developers use or plan to use AI tools. Only 29% trust the output — down 11 points from 2024. The candidates worth hiring are the ones who closed that gap by building evaluation systems, not by being optimistic.Digital Applied analysis, based on Stack Overflow Developer Survey 2025 (survey.stackoverflow.co/2025/ai), Dec 29, 2025

07 — Senior SignalOWASP LLM06 Excessive Agency: the senior-vs-mid distinguishing question.

Most engineering hiring guides that cover AI safety focus on prompt injection (LLM01). That is correct — but it is also the expected answer. Every mid-level candidate who has been interviewing recently knows to name LLM01. The question that distinguishes senior from mid is LLM06: Excessive Agency.

LLM06 Excessive Agency, as defined in the OWASP Top 10 for LLM Applications 2025, describes the failure mode where an AI agent is granted more capability than it needs — or takes autonomous actions beyond its intended scope — resulting in unintended or harmful consequences. The concrete example in an agent context: a customer-service agent that can browse, read, and send emails has excessive agency if its task is only to read and summarize. The ability to send — even if never triggered in test — is an LLM06 risk in production.

Ask a senior candidate: “What is LLM06 Excessive Agency? Give a concrete example in an agent context.” A strong answer will describe both the principle (least-privilege tool design) and the mitigation pattern: restricting tool scope to the minimum needed for the task, adding explicit human-in-the-loop approval for irreversible actions, and designing kill switches that can interrupt an agent mid-execution before a dangerous action completes.

A weak answer will name LLM06 correctly but not be able to connect it to a design decision they have actually made. A candidate who can explain LLM06 but has never implemented a kill switch or least- privilege tool design is operating on theory, not production experience. That is a meaningful mid-level vs senior-level distinction — especially relevant as you build out the agentic engineering team structure covered in our team playbook.

The EU AI Act high-risk obligations are a related screen. High-risk obligations under Article 73 take effect August 2, 2026 — not today. A senior candidate who claims “our product already complies with EU AI Act high-risk obligations” either works for a company that has begun early compliance preparation (unlikely before the effective date) or does not know the timeline (likely). The correct answer for a May 2026 interview is: “We are preparing for the August 2, 2026 effective date.” That is a useful compliance-fluency screen with no trick involved — just knowing a public regulatory date.

08 — Career ArchitectureIC, manager, and specialist paths — where each role lands on salary and scope.

AI engineering careers in 2026 have fragmented into three tracks with meaningfully different compensation structures and hiring criteria. Understanding which track you are hiring for changes both the interview question set and the salary budget. The workforce upskilling playbook covered in our AI upskilling guide addresses what candidates on each track should be building toward; this section addresses what hiring managers should expect from them.

IC track

Engineer → Senior → Staff → Principal

The dominant AI engineering path. At entry/mid: build and ship AI features within defined scopes. At senior: own the technical design of multi-component AI systems, define eval frameworks, set cost budgets. At staff: cross-org technical leadership, architecture decisions for production agent systems, direct input into model selection and infrastructure. At principal: company-level AI technical strategy. Salary band: entry $110K–$160K total comp → principal $350K–$600K+ at non-frontier companies; frontier labs (Anthropic, OpenAI) median $600K–$1.15M+ at any IC level.

The core engineering hire

Manager track

Tech Lead → AI Team Lead → Director of AI

Requires both technical fluency and people management. At tech lead (late mid-level): sets team engineering standards, owns sprint-level delivery, mentors junior engineers, must still code. At AI Team Lead (senior): owns roadmap for an AI product surface, manages 3–8 engineers, partners with product. At Eng Manager AI (staff): manages managers, org design, hiring plan ownership. At Director of AI / VP AI: company-level AI strategy, board-level reporting. Critical note: managers hired entirely on pedigree without production AI experience often fail in their first year as the field moves faster than second-hand knowledge allows.

Must demonstrate recent hands-on work

Specialist track

Eval Engineer · RAG Architect · Agent Architect

Emerging specializations that command individual premiums. Eval Engineer (mid): designs, maintains, and runs evaluation pipelines for production AI systems — the direct embodiment of the AI Career Lab 'eval literacy' signal. RAG Engineer / Prompt Engineer (senior): owns retrieval architecture and prompt engineering at system level. Agent Architect / Trust & Safety Lead (staff): designs multi-agent system architecture, owns safety and LLM06 mitigations. AI Compliance Advisor (director): owns regulatory readiness (EU AI Act, NIST AI RMF). Agentic AI Engineer band: $185K–$320K base + $40K–$120K equity at growth-stage. AI Agent Architect: $260K–$420K base + equity.

Eval Engineer is the new hire of 2026

09 — Hiring Manager PlaybookFrom job spec to offer: the four-step AI hiring playbook.

The structural context for why you are hiring matters as much as the skills you screen for. Our 4% net workforce reduction story covers the macro AI-job-cut data; the follow-up on the 55% of companies that regret those cuts is the relevant counter-pressure for hiring managers: companies that cut AI engineering talent to reduce short-term costs frequently re-hire the same profiles at higher compensation 12–18 months later, with a delayed production roadmap as the additional cost.

Step 1 — Write the spec against the 10-skill list, not a tool list. Job specs that say “3+ years LangChain experience” attract candidates who have used LangChain. Specs that say “experience designing and running eval frameworks for production LLM systems” attract candidates who have shipped under a performance bar. The latter set is smaller, more experienced, and worth more at offer time — but the spec must signal that you know what you are looking for.

Step 2 — Use the resume red-flag overlay before the phone screen. The 12 flags above (starting with “100% accuracy claims” and ending with “no inference cost mention”) can be applied in 10 minutes on a resume. Each flag has a follow-up question for the phone screen if you want to give a candidate a chance to explain. Use the fabrication screen — Sonnet 5, the Matryoshka shortening recited as a native embedding dimension, the wrong file for Claude Code MCP config — as binary filters for senior+ roles.

Step 3 — Run the structured interview against the five skills most relevant to your role. Use the question bank above. For most AI engineering roles in 2026, the five most differentiated screens are: eval design (universal), cost optimization (catches lab-only experience), MCP integration (screens for doc-reading habits), agent orchestration failure modes (separates mid from senior), and frontier-model fluency (screens for whether they are tracking a fast-moving field). Add OWASP LLM06 for any role that touches agent systems or customer-facing deployments.

Step 4 — Budget against the seniority-band matrix. AI engineers at mid level command $170K–$260K total comp nationally per Kore1's 2026 data. San Francisco/Bay Area and New York compress that upward by $50K–$100K. If your approved budget is below-market, the right response is not to lower the bar — it is to scope the role to a level where your budget is competitive, or to revisit the comp plan against the 2–3 dollars of reskilling for every 1 on AI rule — investing in upskilling existing engineers who are one tier below where you need them may be more capital-efficient than a full external hire at senior level. For strategic advisory on building the AI team structure for your organization, our AI transformation advisory practice is specifically designed for that planning work.

Conclusion

Hire for the trust gap — not the tool list.

The AI developer hiring market in 2026 has a single underlying dynamic: 84% of developers have adopted AI tools, but only 29% trust the output. That trust gap — the distance between “uses AI” and “can verify, constrain, and evaluate AI” — is exactly where your hiring screen should live. The candidates on the right side of that gap command a 28–56% wage premium over their peers and are worth every basis point of it. The candidates on the wrong side can prompt an LLM fluently and build impressive demos. They will ship fabrications, miss production cost budgets, and have no answer when the model degrades quietly in week three.

The 10 skills, 50 questions, and 12 red flags in this guide give you a structured way to test which side of that gap a candidate lives on. None of it requires a PhD in machine learning or access to frontier-lab training infrastructure. It requires that a candidate has shipped AI systems under real constraints: a performance budget, a cost budget, a safety requirement, and a production timeline. Candidates who have done that will have answers to the eval design, cost optimization, and OWASP LLM06 questions above. Candidates who have not will not — and that distinction is knowable in a 45-minute structured interview.

AI Developer Hiring 2026: Skills That Actually Matter

01 — 2026 AI Hiring Market109% posting growth, a 56% wage premium, and a trust gap that defines the screen.

YoY growth 2024 → 2025

AI vs non-AI same-role peers

Pro devs using AI tools daily

Devs not yet using AI agents

02 — Must-Have SkillsThe 10 skills that replaced the LangChain + Pinecone resume.

Supervisor patterns, tool-calling loops, failure recovery

Model Context Protocol — spec 2025-11-25

Golden datasets, LLM-as-judge, online + offline evals

System prompts, caching, few-shot calibration

pgvector, hybrid search, recall engineering

Model routing, prompt caching, per-task P&L

OWASP LLM Top 10, LLM06 Excessive Agency

Vision-action loops, sandboxing, kill switches

LLM tracing, eval alerting, trace economics

Weekly changelog as daily habit, not Google search

03 — Interview Question Bank50 questions across 10 skills — with rubrics for good and junior-flagged answers.

Agent Orchestration — 5 questions

MCP Integration — 5 questions

Eval Design — 5 questions

Prompt Engineering — 5 questions

Vector DB / RAG — 5 questions

Cost Optimization — 5 questions

Safety / Guardrails — 5 questions

Computer-Use Deployment — 5 questions

Production Observability — 5 questions

Frontier-Model Fluency — 5 questions

04 — Compensation DataAI engineer salary bands: level, city, and specialization (retrieved 2026-05-24).

AI engineer total compensation by seniority level

05 — Screening Overlay12 resume red flags that signal lab experience over production work.

Claims 100% LLM accuracy

Frameworks-only fluency

Lists 'Sonnet 5' in model stack

'Implemented RAG' — no recall numbers

06 — Fabrication LiteracyThe 30-second fabrication-literacy screen no other hiring guide covers.

07 — Senior SignalOWASP LLM06 Excessive Agency: the senior-vs-mid distinguishing question.

08 — Career ArchitectureIC, manager, and specialist paths — where each role lands on salary and scope.

Engineer → Senior → Staff → Principal

Tech Lead → AI Team Lead → Director of AI

Eval Engineer · RAG Architect · Agent Architect

09 — Hiring Manager PlaybookFrom job spec to offer: the four-step AI hiring playbook.

Hire for the trust gap — not the tool list.

From job spec to production-ready team.

AI team strategy and hiring

What hiring managers ask about AI developer skills and salaries.

Continue exploring AI hiring, team structure, and workforce strategy.

Agent Washing: The Definition — and a Scorecard to Catch It

The AI Agent Build & Run Cost Index 2026: Real Numbers

The $18 GLM Coding Plan: An Honest Value Analysis 2026

Alex Karp: 'Tokens That Create No Value' — What He Said