The agentic AI data foundation is the single hardest cap on downstream agent quality — and the stage teams most consistently under-budget. A model can be frontier-class, a vendor can be top-tier, an orchestration framework can be elegant; none of that matters if the corpus the agent retrieves from is stale, ambiguous, duplicated across four systems, or quietly mixes restricted PII into general-access prompts. Stage 3 is where that gets fixed.

What's at stake: every retrieval-grounded agent, every workflow that touches a record of truth, every analytic surface a user actually trusts — all of them inherit the quality ceiling of the data underneath. Skip the audit, the readiness assessment, and the classification policy, and you ship an agent that is confidently wrong in production. Worse, you ship one that is confidently wrong on regulated content, which is the failure mode that ends programs.

This guide covers eight sections: why Stage 3 sets the cap, the 30-point quality audit, the three-level RAG-readiness assessment, the source-of-truth map and ownership matrix, the four-tier classification policy, PII detection and retention, the data-debt prioritisation matrix, and the hand-off contract into Stage 4 vendor selection. Every artifact below is a template — adapt the wording, keep the structure.

Key takeaways

01
Data foundation is the limit on agent quality.Model capability, prompt craft, and orchestration all cap out at the quality of the data underneath. Stage 3 is the one place where investment compounds across every downstream stage.
02
RAG readiness needs explicit measurement.L1 (corpus exists), L2 (chunked and embedded), L3 (retrieval baseline measured). Most teams self-report L3 and live at L1 — the assessment exists to close that honesty gap.
03
Source-of-truth map prevents duplicate effort downstream.Without an explicit SoT-per-entity record, every team rebuilds the same retrieval pipeline against a different copy of the customer table. The map is one Friday afternoon; not having it is a year of drift.
04
Classification policy is the cheapest compliance lever.Four tiers (public · internal · confidential · restricted) and a default-tag rule cover ~90% of access-control decisions for agents. Spend a week here and the next twelve months of audit questions answer themselves.
05
Data debt compounds faster than tech debt.A stale source, a duplicate record, an unowned table — each one silently corrupts every retrieval downstream. The data-debt matrix sorts by blast radius, not age. Fix the high-blast items first.

Pipeline navigation · Stage 3 of 10

You are reading Stage 3 of the 10-stage agentic AI implementation pipeline. Previous: Stage 2 · Strategy Roadmap. Next: Stage 4 · Vendor Selection. The full pipeline: 1 Discovery · 2 Strategy Roadmap · 3 Data Foundation · 4 Vendor Selection · 5 Prototype · 6 Pilot · 7 Production · 8 Observability · 9 Governance · 10 Operating Model.

01 — Why Stage 3The data foundation is the limit on agent quality.

Every agent program eventually meets the same wall. The model is capable. The prompt is tuned. The orchestration is clean. And the agent still hallucinates pricing, cites the wrong policy, conflates two customers with similar names, or quotes a product page from three website redesigns ago. None of those failures originate in the model layer. They originate in the data layer, and the model faithfully reflects what was retrieved.

Stage 3 is the only stage in the pipeline where investment compounds across every downstream stage. A vendor decision (Stage 4) affects one vendor. A prototype (Stage 5) affects one workflow. A data-foundation improvement raises the ceiling for every agent built afterward — for years. That asymmetry is why Stage 3 deserves disproportionate budget and why under-budgeting it is the single most common pattern we see in stalled programs.

The honest framing: most organisations entering Stage 3 discover their data is in worse shape than they assumed. The job here is not to be embarrassed by that — it's to measure honestly, fix the highest-blast-radius problems first, and hand a known-quality foundation to Stage 4. Two weeks of disciplined work here saves quarters of remediation later.

"The model will always be smarter than your data. The data will always be the ceiling on what the model can do for you. Stage 3 raises the ceiling."— A principle worth repeating before every program kickoff

02 — Quality AuditFreshness, accuracy, consistency, lineage.

The 30-point audit below is the artifact that opens every Stage 3 engagement. Four pillars — freshness, accuracy, consistency, lineage — with seven to eight checks each. Score each check as green, amber, or red on the actual corpus you intend to ground agents on. The output is not a vanity report; it's a triage list for the data-debt matrix in Section 07.

Run the audit on a representative sample, not the whole corpus. Twenty source documents per system, drawn across the date range you intend to retrieve from, surfaces the systemic problems faster than a full scan ever will. The exhaustive scan comes later, after the prioritisation matrix tells you which systems are worth scanning in depth.

The 30-point checklist

# Data Quality Audit · 30 points · Stage 3 template
# Score each: G (green) · A (amber) · R (red)
# Source: ____________________   Owner: ____________________
# Sampled: ____ records   Date: ____________________

## Freshness (8 points)
[ ] F1  Source updated within agreed SLA (daily/weekly/monthly)
[ ] F2  No records older than declared retention window
[ ] F3  "Last modified" timestamp present and reliable
[ ] F4  Stale records (>retention) flagged or archived
[ ] F5  Refresh cadence documented and monitored
[ ] F6  Source-of-truth update propagates within SLA
[ ] F7  No silent failures in upstream sync jobs
[ ] F8  Freshness dashboard visible to data owner

## Accuracy (8 points)
[ ] A1  Field values match source-of-truth on spot check
[ ] A2  No truncated text / mojibake / encoding artefacts
[ ] A3  Numerics in expected ranges (no -1 placeholders, no nulls-as-zero)
[ ] A4  Categoricals match controlled vocabulary
[ ] A5  Dates parse cleanly across locales (no DD/MM vs MM/DD ambiguity)
[ ] A6  Referential integrity intact (no orphan foreign keys)
[ ] A7  Calculated fields recompute correctly from sources
[ ] A8  Hand-labelled accuracy sample ≥ 95% on critical fields

## Consistency (7 points)
[ ] C1  Same entity referenced consistently across systems
[ ] C2  No duplicates within source (canonical-record discipline)
[ ] C3  Naming conventions consistent (column / field / tag)
[ ] C4  Units consistent (currency, distance, mass) and labelled
[ ] C5  Timezones explicit and normalised to UTC at the boundary
[ ] C6  Schema versions tracked and breaking changes flagged
[ ] C7  Cross-system joins succeed on shared keys at ≥ 99%

## Lineage (7 points)
[ ] L1  Every field traced to an upstream source or derivation
[ ] L2  Transformations documented (code, SQL, or written rule)
[ ] L3  Ownership assigned per source (one human, named)
[ ] L4  Access controls documented and enforced
[ ] L5  Retention policy declared and enforced
[ ] L6  Audit log of changes for high-sensitivity fields
[ ] L7  Recovery path documented (backups, point-in-time)

## Scoring rubric
G = production-grade, no remediation needed
A = passable, remediation queued in data-debt matrix
R = blocker, must fix before agent goes live

A few practical notes from running this audit dozens of times. First, expect the freshness pillar to score worst — almost every organisation has at least one source that has silently fallen behind its declared SLA. Second, expect the accuracy pillar to be the most contentious — owners disagree about what "accurate" means until you spot-check against ground truth in front of them. Third, expect lineage to be the most under-documented — the human owner often exists, but the documentation of what they own does not.

Audit framing

The audit is diagnostic, not punitive. Frame it as "here is the floor we're starting from" — never as "here is who failed." The owner who scored worst is the one whose buy-in you need most for remediation. Lose the room in the audit and Stage 3 stalls before it starts.

03 — RAG ReadinessChunking, embedding fit, retrieval baseline.

RAG readiness measures whether a corpus is ready to be retrieved from — distinct from whether it's clean (quality audit) or governed (classification, PII). The honest version of this assessment lives in three levels. Most teams self-report Level 3 and operate at Level 1. The assessment exists to close that gap.

Run the assessment per-workload, not once-per-organisation. A customer-support knowledge base may sit at L3 while the same company's product-pricing corpus sits at L1; the agent that grounds on both inherits the lower of the two. Score every corpus an agent will actually retrieve from.

Level 1

Corpus exists

files identified · access path known

You can name the corpus, point to it, and pull a sample. No chunking strategy, no embeddings, no retrieval pipeline. Most organisations live here when Stage 3 begins. Not a failure — a starting point.

L1 · ~60% of teams at kickoff

Level 2

Chunked + embedded

chunking strategy chosen · embeddings indexed

A defensible chunking strategy applied (paragraph, sliding window, or semantic), embeddings produced with a named model, and stored in a vector store. No measurement of retrieval quality yet — the pipeline runs but nobody knows how well.

L2 · the trap level

Level 3

Retrieval baseline measured

labelled test set · recall@k · CI gating

A 50-100-query hand-labelled test set, measured recall@10 against exact-search ground truth, and a CI check that fails on regression. Production-grade. Agents grounded on an L3 corpus are the only ones safe to ship.

L3 · production-ready

The trap level is L2. Teams stand up chunking and embeddings, point an agent at the result, ship to staging, and never close the measurement loop. The agent works — until it quietly stops working, and nobody can tell when, because no baseline ever existed. The discipline of L3 is what separates corpora that survive a model swap from corpora that silently regress every time something upstream changes.

For the mechanics of moving from L2 to L3 — schema, chunking patterns, IVFFlat vs HNSW, recall measurement — see our self-hosted RAG with Postgres pgvector tutorial. Stage 3 readiness is the strategic question; that guide is the implementation answer.

L2 → L3 minimum bar

A corpus is not L3 until three artifacts exist: a labelled test set of representative queries with marked-relevant chunks, a measured recall@10 number against exact-search ground truth, and a CI check that compares new builds against that baseline and fails on regression beyond an agreed threshold. Anything less is L2 with optimism attached.

04 — SoT MapSystem catalog, ownership, refresh cadence.

The source-of-truth map names, for every entity an agent will touch, which system is canonical, who owns it, how often it refreshes, and what its downstream copies are. Without it, every team rebuilds the same retrieval pipeline against a different copy of the customer table and pretends they're solving different problems.

The artifact is small — usually a single sheet — and the discipline is the value. The act of writing it down forces the "which system is canonical for customers?" conversation that organisations otherwise avoid for years. Forty-five minutes of controlled discomfort buys a decade of clarity.

The source-of-truth template

# Source-of-Truth Map · Stage 3 template
# One row per entity. Cross-link related entities by ID.
# Owner = one human, named (not a team).

ENTITY              | SoT SYSTEM      | OWNER         | REFRESH    | DOWNSTREAM COPIES                  | NOTES
--------------------|-----------------|---------------|------------|------------------------------------|---------------------------------
Customer            | Salesforce      | J. Patel      | real-time  | Snowflake (15min), HubSpot (1h)    | HS overrides email — fix
Account             | Salesforce      | J. Patel      | real-time  | Snowflake (15min), Stripe (manual) | Stripe out-of-sync ~3% records
Product (catalog)   | Shopify         | M. Okonkwo    | hourly     | Snowflake, marketing site         | Two SKU naming schemes — unify
Pricing             | Stripe Products | F. Romano     | manual     | Sales decks (stale), website      | Decks are 6 months stale
Inventory           | NetSuite        | F. Romano     | nightly    | Shopify (hourly), warehouse WMS    | WMS authoritative for in-flight
Support ticket      | Zendesk         | A. Demir      | real-time  | Snowflake (daily), Slack notif    | Snowflake schema lags 1 release
Knowledge article   | Notion          | A. Demir      | as-edited  | Public docs site (15min CDN)      | No versioning — add ETags
Customer interaction| Gong            | J. Patel      | real-time  | Snowflake (daily)                 | Transcripts only — no metadata
Marketing content   | Sanity          | S. Lindqvist  | as-edited  | Marketing site (build-time)       | Pre-prod previews safe to index
Legal / contracts   | Ironclad        | L. Karimi     | as-signed  | None — silo                       | Restricted, do not index w/o ACL

# Ownership rules
# 1. Every entity has one named owner — escalation is named, not anonymous.
# 2. Owner is accountable for SoT freshness, accuracy, and access policy.
# 3. Downstream copies declare their lag explicitly — no "real-time-ish".
# 4. Agents retrieve from SoT or from a copy with a published staleness budget.

Two things worth knowing once you write this down. First, the number of entities is almost always smaller than expected — most organisations operate on ten to fifteen entities of agent relevance, not the fifty an enterprise architect would draw. Second, the "Downstream copies" column is where the agent-grounding decisions actually live: an agent reading the stale copy will be wrong in exactly the ways the lag predicts. Document the lag, route the agent to the right copy for its latency budget.

The 'one owner' rule

If an entity's owner is a team, it has no owner. Names, not teams. The named human signs off on the SoT declaration, the freshness SLA, and the classification tier. This single rule eliminates roughly half the data-quality regressions we see in flight.

05 — ClassificationPublic, internal, confidential, restricted.

Four tiers cover roughly 90% of access-control decisions agents need to make. The simplicity is the point — a five-tier or seven-tier policy looks more sophisticated and ships less. The matrix below is the template; rename the tiers if your security team has existing nomenclature, but keep the count at four.

The classification policy interacts with the agent layer in two places. First, retrieval: an agent operating under a user context must only retrieve from tiers that user can access. Second, generation: the model's output inherits the highest tier of any source it cited. Both are enforced upstream of the model, not inside the prompt.

Tier 1 · Public

Anyone can read

Marketing site, blog, public docs, brand assets, published pricing. Default tag — assume public unless explicitly raised. Agents can retrieve and quote freely. No access control beyond rate-limiting.

Default-public posture

Tier 2 · Internal

All employees, no customers

Internal documentation, process wikis, non-sensitive product roadmaps, internal Slack archives. Agents retrieve under an employee context. Output never leaves an authenticated employee surface.

Auth-required

Tier 3 · Confidential

Need-to-know within company

Customer records, financials, salary data, unannounced product plans, support transcripts. Retrieval requires explicit role membership. Agents log every access. Output gated to authenticated, authorised users only.

Role-based, audited

Tier 4 · Restricted

Regulated / contractual

PII subject to GDPR / POPIA / HIPAA, legal contracts, M&A material, anything under NDA. Default = not indexed for general agents. Bespoke agents with documented legal basis only. Per-record audit trail.

Default-deny

The default tag matters more than the policy text. A default-public posture (Tier 1 unless raised) ships faster and is harder to govern; a default-internal posture (Tier 2 unless lowered) governs more cleanly and ships slower. Pick consciously based on your sector — consumer brands typically benefit from default-public, regulated sectors from default-internal. Then write the default into the policy explicitly so nobody has to guess.

Tier 1 share

~35%

of typical corpus

Marketing, public docs, brand assets. The cheapest tier to govern — fewer access controls, simpler retrieval, broader reuse. Agents grounded here move fastest.

Default-public sectors

Tier 2 share

~40%

of typical corpus

Internal wikis, runbooks, employee-only material. The bulk of internal-agent retrieval. Auth-gated retrieval at the corpus level is sufficient — no per-record ACL needed.

The working majority

Tier 3 share

~20%

of typical corpus

Customer records, financials, support history. Per-record access controls, audit logging, output gating. Where most of the agent-grounding effort actually lives in regulated sectors.

Audited tier

Tier 4 share

~5%

of typical corpus

Restricted regulated content — PII under GDPR / POPIA / HIPAA, contracts, NDA material. Default-not-indexed. Bespoke agents only, with named legal basis.

Default-deny

06 — PII HandlingDetection, redaction, retention policy.

PII handling is the part of Stage 3 where shortcuts become regulatory exposure. The framing that holds up: detect at ingestion, redact or tokenise before embedding, retain on a documented schedule, audit every access. Each of those four verbs is a discipline; the policy template below names each explicitly.

Detection

Three layers stack. Pattern-based detectors catch the obvious cases — email addresses, phone numbers, national IDs, payment card numbers — using regular expressions tuned per jurisdiction. Named-entity recognition models catch the messier cases — personal names in free text, addresses embedded mid-sentence, indirect identifiers. A human review pass on the highest-tier sources catches the long tail. Run all three on ingestion; never assume the upstream system has done it for you.

Redaction strategies

Three patterns, picked per workload. Hard redaction replaces detected PII with a fixed token ([REDACTED-EMAIL]) before embedding — destroys the ability to recover the original, safest, used when the agent never needs to reach the underlying record. Tokenisation replaces PII with a reversible token that can be resolved server-side under audit — agent retrieves tokens, the surface that renders to the user resolves them under authorisation. Pseudonymisationreplaces with a stable surrogate (consistent within a corpus, meaningless outside) — useful when the agent needs to reason about "the same person" across multiple chunks without ever holding the actual identity.

Retention policy

Retention is per-tier and per-data-type, declared in writing. The template that holds up across sectors: declare a purpose, declare a retention window appropriate to that purpose, declare a deletion mechanism, and audit deletions. "We'll keep it until we're asked to delete it" is not a policy — it's an absence of policy.

# PII Retention Policy · Stage 3 template excerpt

DATA TYPE              | PURPOSE                          | RETENTION | DELETION TRIGGER
-----------------------|----------------------------------|-----------|------------------
Email (customer)       | Authentication, support routing  | 7 years   | Account deletion + 90d
Phone (customer)       | Verification, support callbacks  | 7 years   | Account deletion + 90d
National ID            | KYC / regulatory                 | 7 years   | Statutory schedule
Payment card           | Not retained — tokenised at PSP  | n/a       | Token only — no raw
Support chat transcript| Quality, training, compliance    | 3 years   | Rolling deletion
Marketing event data   | Attribution, optimisation        | 18 months | Rolling deletion
Employee record (HR)   | Employment, statutory            | 7 years post-exit | Statutory schedule
Sales call recording   | Coaching, deal review            | 2 years   | Rolling deletion
Cookie / session ID    | Session continuity               | 30 days   | Rolling expiry

The deletion test

A retention policy that has never executed a deletion is not a policy. Run a quarterly deletion test: pick a record older than its retention window, verify it has been deleted from every system in the source-of-truth map, document the result. The first run catches the systems that don't actually delete. The second run catches the ones the first run missed.

Two operational notes. First, embeddings themselves can encode recoverable PII even after the source text is redacted — embedding inversion is a real attack surface in research literature. Practical mitigation: redact before embedding, never the other way around. Second, agent transcripts are themselves PII the moment a user types their name; the retention policy applies to transcripts the same as it applies to source records.

If your sector is regulated (healthcare, financial services, public sector), our AI transformation engagements stage 3 deliverables include a sector-specific PII handling rubric mapped to your governing framework — GDPR Article 30, POPIA Section 19, HIPAA §164, or sector equivalents.

07 — Data DebtWhat to fix first — prioritisation matrix.

Every Stage 3 audit produces more findings than the team can remediate in the time budget. The prioritisation matrix sorts by blast radius — how many downstream agents inherit the problem — multiplied by remediation cost. High blast, low cost: fix immediately. High blast, high cost: schedule with sponsorship. Low blast, anything: defer or accept.

The mistake we see most often: prioritising by age. The oldest data-quality issue is rarely the most impactful; it's usually the one that has been around long enough that everyone has built workarounds. Workarounds are not fixes — they're evidence of the blast radius. Sort by impact, not by tenure.

Data-debt items by blast radius · typical Stage 3 audit output

Composite blast-radius scoring from Stage 3 engagements · 2024-2026

Stale source of truthPricing decks 6 months out of date — agent quotes wrong prices

Blast: critical

Duplicate customer recordsSame customer in CRM 3x — agent conflates context across accounts

Blast: high

Unowned data sourceKnowledge wiki — no named owner, no refresh SLA

Blast: high

Missing classification tagsConfidential data tagged as internal — over-disclosure risk

Blast: high

Chunking strategy mismatchSliding window on tabular data — semantic boundaries lost

Blast: medium

Stale embeddings vs new modelIndexed with old model — new model available, no migration

Blast: medium

Schema drift in downstream copiesSnowflake schema lags Salesforce by 1 release — analytics divergence

Blast: low

Inconsistent timezonesSome sources UTC, some local — agent reasoning across them errs

Blast: low

The pattern that holds up across engagements: the top three items on the matrix consume 70% of the remediation budget and resolve 80% of the downstream agent quality problems. The long tail matters — and gets scheduled into the operating model in Stage 10 — but the program lives or dies on the top three. Spend disproportionately there.

"Data debt compounds faster than tech debt because every downstream agent silently inherits it. Pay it down in priority order — by blast radius, not by tenure."— A Stage 3 principle worth keeping on the wall

08 — Next StageHand-off to vendor selection (Stage 4).

The hand-off contract from Stage 3 to Stage 4 is short. Vendor selection becomes meaningfully easier when the buyer can hand the vendor — and the internal eval team — five artifacts: the audit scorecard, the readiness assessment per workload, the source-of-truth map, the classification policy, and the data-debt roadmap with sequencing.

With those in hand, Stage 4 stops being "which vendor sounds best" and becomes "which vendor demonstrably operates on our actual data shape and respects our actual access controls." The RFP template in Stage 4 references these artifacts directly; the evaluation rubric scores against them. Without the Stage 3 output, the RFP is generic and the eval is theatre.

Artifact 1

Audit scorecard

The 30-point checklist filled out per source, with red items flagged for vendor PoCs. Lets vendors propose against a known floor, not a fantasy version of your data.

Deliverable to Stage 4

Artifact 2

Readiness per workload

L1 / L2 / L3 scoring for every corpus an agent will retrieve from. Vendors propose differently for an L1 corpus than an L3 corpus — give them the truth up front.

Deliverable to Stage 4

Artifact 3

SoT map + classification policy

Names the systems, the owners, the refresh cadence, the tiers. Vendor demos can be evaluated against real entity boundaries, not generic 'customer record' hand-waving.

Deliverable to Stage 4

Artifact 4

Data-debt roadmap

The prioritisation matrix with sequencing. Some items remediate before vendor selection; some after. The roadmap is what tells you which.

Deliverable to Stage 4

One closing note on sequencing. Resist the urge to fully remediate Stage 3 before opening Stage 4. The top-three blast-radius items should be in flight; the long tail can run in parallel with vendor selection. Waiting for a perfect data foundation is the most common reason agentic programs miss their first production deadline — and it's avoidable. Ship Stage 3 to the "known quality, top items remediating" state, then move.

For the next step, our Stage 4 vendor selection templates pick up exactly where this guide ends — scorecard matrix, RFP template, evaluation rubric, reference-call script, and contract checklist, all referencing the Stage 3 artifacts above.

Stage 3 wrap

The data foundation is the cheapest place to invest and the most expensive to fix later.

Every Stage 3 engagement ends in roughly the same place: a team that started uncertain about their data, finished with a measured floor, and discovered that the floor was both lower and more fixable than they expected. The audit names the problems. The readiness assessment quantifies them. The source-of-truth map ends the ambiguity about ownership. The classification policy closes the access-control gap. The data-debt matrix sequences the remediation. Two weeks of disciplined work — that's the whole stage.

The principle worth carrying forward: agent quality is bounded by data quality, and data quality is bounded by how honestly you measure it. The teams that ship production agents are the ones who scored honestly in Stage 3 and remediated in priority order. The teams that stall are the ones who skipped Stage 3 in favour of vendor demos and prototype velocity. The cheapest mistake a program can make is treating Stage 3 as a formality.

What changes downstream when Stage 3 is done well: Stage 4 vendor evaluation runs on real artifacts, not generic capabilities. Stage 5 prototypes ground on corpora with known retrieval baselines. Stage 6 pilots avoid the "why is the agent quoting last year's pricing" meeting. Stage 7 production launches with confidence intervals on quality, not hope. Every downstream stage gets easier, faster, and cheaper. That compounding is the entire return on Stage 3 investment.

Agentic AI Data Foundation: Stage 3 Pipeline Templates

01 — Why Stage 3The data foundation is the limit on agent quality.

02 — Quality AuditFreshness, accuracy, consistency, lineage.

The 30-point checklist

03 — RAG ReadinessChunking, embedding fit, retrieval baseline.

Corpus exists

Chunked + embedded

Retrieval baseline measured

04 — SoT MapSystem catalog, ownership, refresh cadence.

The source-of-truth template

05 — ClassificationPublic, internal, confidential, restricted.

Anyone can read

All employees, no customers

Need-to-know within company

Regulated / contractual

of typical corpus

of typical corpus

of typical corpus

of typical corpus

06 — PII HandlingDetection, redaction, retention policy.

Detection

Redaction strategies

Retention policy

07 — Data DebtWhat to fix first — prioritisation matrix.

Data-debt items by blast radius · typical Stage 3 audit output

08 — Next StageHand-off to vendor selection (Stage 4).

Audit scorecard

Readiness per workload

SoT map + classification policy

Data-debt roadmap

The data foundation is the cheapest place to invest and the most expensive to fix later.

The data foundation limits every downstream agent — fix it before procurement.

Stage 3 engagements

The questions data teams ask before procurement.

Continue the pipeline.

Agentic AI Production Deploy: Stage 6 Pipeline Kit

AI Coding Q3 2026 Projection: Tool Consolidation Forecast

Agentic AI Engineering Playbook: Coding, Review, Ops 2026