AI customer support launches fail when CSAT lags deflection. The failure pattern is consistent across the deployments we have seen unravel: the team picks a vendor, signs a license, builds a thin knowledge base in three weeks, and ramps the bot to a meaningful slice of traffic before the CSAT instrumentation exists to detect damage. The 90-day plan in this piece is the antidote — a phased, CSAT-gated rollout that treats the support program as a product launch with measurable outcomes.

The shape of the plan is deliberately conservative on traffic and aggressive on instrumentation. Days 1-30 are spent on the knowledge layer — auditing what documentation exists, building an intent catalog from real tickets, and shipping a RAG retrieval index against the top-100 ticket archetypes. Days 31-60 launch a 1% deflection pilot with explicit CSAT gates, a designed escalation handoff, and per-archetype confidence thresholds. Days 61-90 ramp toward a 10-25% deflection ceiling, wire production observability, and train the agent team on the handoff pattern so escalations land cleanly.

This piece is the operational guide underneath that plan. It covers why 90 days is the right horizon, the milestone-by-milestone sequence for each phase, the CSAT gates that govern when traffic can ramp, the templates we ship with client engagements (knowledge audit, escalation rubric, handoff script), and the four support-launch failure modes that account for almost every unsuccessful rollout we have seen.

Key takeaways

01
CSAT is the only metric that counts.Deflection without a CSAT constraint is a vanity number — a bot can hit 60% deflection by deflecting everything into a doom loop. Wire resolution CSAT, delayed CSAT, and model-scored conversation CSAT before the pilot launches, and treat any tier dropping more than two points as an automatic rollback trigger.
02
Knowledge audit sits upstream of every RAG build.RAG quality is a function of the source documents it grounds against. Spending three weeks on a knowledge audit — what exists, what is stale, what is missing, who owns updates — is the cheapest improvement to deflection quality you can ship. Skip this and the bot is a confident hallucination engine.
03
Pilot at 1% before any ramp decision.One percent of traffic for two weeks is enough to detect CSAT damage, false-positive escalations, and confidence-threshold misconfiguration without putting the broader customer base at risk. Teams that ramp from 0% to 10% in the first month consistently miss the CSAT signal until it is too late to recover.
04
Escalation handoff is half the customer experience.The most damaging support pattern is an AI handoff that loses conversation context — the customer repeats their problem to a human and walks away frustrated. The escalation rubric, the context payload, and the agent training that supports the handoff are as important as the model that did the deflection in the first place.
05
Run quarterly model-upgrade evals.The frontier model landscape moves quarterly. Build a small eval suite at pilot launch — 50-100 prompts that capture your top archetypes with expected outputs — and rerun it every quarter against the current frontier. The cost of a quarterly re-evaluation is small compared with the cost of running an outdated model for six extra months.

01 — Why 90 DaysSupport launches fail when CSAT lags deflection.

Ninety days is not arbitrary. It is long enough to do the knowledge work upstream of the model — most teams underestimate this and pay for it in deflection quality — and short enough to keep momentum in front of the executive sponsor. Anything faster tends to skip the knowledge audit or the CSAT instrumentation; anything slower tends to lose stakeholder focus before the pilot ramps.

The failure pattern we see most often is a four-to-six-week rollout that hits deflection numbers in week three and ships CSAT damage that surfaces in week eight, by which point the bot is touching too much traffic to roll back cleanly. The customer base remembers a bad experience for longer than the deflection dashboard remembers a good month. Treating CSAT as a gating constraint rather than a downstream metric is what separates a launch that holds from a launch that has to be quietly rolled back.

The deeper reason 90 days works is that the instrumentation has to lead the model. Resolution CSAT, delayed CSAT measured 48 to 72 hours after the conversation closes, and model-scored conversation CSAT surfaced to a daily QA review — all three have to be live before the pilot opens. The first 30 days build the knowledge and instrumentation layer; the second 30 days launch a tightly-bounded pilot; the third 30 days ramp on the strength of the data those layers produce.

The 90-day cadence at a glance

Days 1-30: knowledge audit, intent catalog, top-100 RAG build, CSAT instrumentation wired in. Days 31-60: 1% deflection pilot, per-archetype confidence thresholds, escalation rubric, handoff context payload. Days 61-90: controlled ramp to 10-25% deflection, production observability, agent handoff training, quarterly eval cadence in place. Each phase gates the next on a CSAT-neutral or CSAT-positive result.

The plan is also opinionated about what does not belong in the first 90 days. Multi-language support, voice channels, and advanced personalisation are all expansions that should follow the day-90 stability check rather than ride alongside it. The same is true for ambitious deflection targets above 25% in the first quarter — those numbers can be real over a longer horizon, but they almost always require a depth of tooling integration (order lookup, refund APIs, account state, billing systems) that is not realistic to build inside the first 90 days. Ship the foundation; expand from there.

02 — Days 1-30Knowledge audit, intent catalog, top-100 RAG build.

The first 30 days are the knowledge-and-instrumentation phase. No customer-facing traffic touches the AI in this window. The entire effort is upstream — audit the documentation, build the intent catalog from real tickets, ship a RAG retrieval index against the top-100 ticket archetypes, wire the CSAT instrumentation layer, and pick the escalation rubric the team will use once the pilot opens.

The temptation in this phase is to skip ahead to the model and the vendor selection. Resist it. The reason most pilots disappoint is not the model — it is the knowledge layer underneath the model. Documents that are stale, missing, or contradictory will produce a confidently wrong bot regardless of which frontier model you pick.

Week 1

Knowledge audit · coverage map

owner: support ops · output: spreadsheet

Inventory every help-centre article, internal SOP, and macro template. Tag each with last-updated date, ownership, and whether the underlying product behaviour is still accurate. Flag stale, missing, and contradictory documents — that flag list becomes the work backlog for weeks 2 and 3.

Foundation work

Week 2

Intent catalog from real tickets

owner: support ops + data · output: archetype list

Pull the last 90 days of ticket data, cluster by intent, and rank by volume. The top-100 archetypes typically cover 70-85% of inbound volume. For each, capture the canonical resolution path, the typical context the agent needs, and the failure modes. This is the catalog the RAG layer will be grounded against.

Volume-weighted

Week 3

RAG build · top-100 index

owner: engineering · output: retrieval index

Ship the retrieval index against the top-100 archetype documents. Embeddings, chunking strategy, and re-ranker selection are all archetype-driven decisions — billing questions need different chunking from troubleshooting flows. Validate retrieval quality with held-out queries before any generation runs through it.

Retrieval before generation

Week 4 · Mon-Wed

CSAT instrumentation wired

owner: engineering + ops · output: dashboards

Three layers — resolution CSAT immediately post-conversation, delayed CSAT at 48-72 hours, and model-scored conversation CSAT surfaced to a daily QA queue. Each layer needs a baseline reading taken from human-only traffic in the same window so the pilot has something to compare against in week 5.

Lead, not lag

Week 4 · Thu-Fri

Escalation rubric finalised

owner: support ops · output: rubric doc

Per-archetype confidence thresholds, context-payload spec for the handoff, and the explicit ‘never deflect’ list (refunds, churn-saves, security issues, anything regulated). Sign-off from the support lead, the legal lead where relevant, and the product owner before week 5 opens the pilot.

Gate the gate

One nuance worth surfacing here. The knowledge audit is rarely the work the support team most wants to do — it is closer to housekeeping than to a launch. But it is the work with the highest leverage on the entire 90-day plan. A team that ships a disciplined audit in weeks 1 and 2 typically lands a 10-15 percentage-point higher deflection ceiling at day 90 than a team that skips it, and avoids almost every category of confident-hallucination failure mode that plagues less-prepared rollouts. The audit is the launch.

For teams that have not run a knowledge audit before, the structure is straightforward: build a spreadsheet of every article and SOP, assign each row to an owner, ask each owner to flag any document that is older than six months or has had a product change since last review, and triage the resulting list into "update before pilot", "update by day 60", and "deprecate". Three columns, two weeks of work, and the deflection ceiling moves more than any model upgrade will in the first year.

"The knowledge audit is the launch. The model is just the interface."— Field note · 2026 client engagements

03 — Days 31-60Deflection pilot at 1%, CSAT gates, escalation handoff.

The second 30 days launch a tightly-bounded deflection pilot. One percent of inbound support volume routes to the AI; the other 99% stays on the existing human-only path. The window is long enough to see CSAT damage if it is there, and the traffic slice is small enough that any damage is contained.

The point of the pilot is not to hit a headline deflection number — it is to validate that the knowledge layer, the confidence thresholds, and the escalation rubric all hold up against real customer traffic. Deflection in this window is a byproduct, not the goal.

Week 5

Pilot opens at 1%

owner: engineering · output: live traffic slice

Route 1% of inbound support traffic to the AI path. Use either a session-hash routing rule or a channel split (e.g. one help-centre embed only) — both work. Daily review of the model-scored CSAT queue starts day one of pilot; resolution-CSAT comparison against baseline starts at the end of week 5.

Contained launch

Week 6

Confidence-threshold tuning

owner: engineering + ops · output: per-archetype thresholds

Per-archetype confidence thresholds tuned against the week-5 data. Too high and deflection collapses; too low and CSAT collapses. Tuning is an archetype-by-archetype decision — order-status thresholds will sit higher than billing-dispute thresholds because the cost of an escalation miss is different.

Per-archetype

Week 7

Escalation handoff validated

owner: support ops + engineering · output: handoff QA log

QA the escalation handoff end-to-end. Does the agent receive the conversation transcript, the model's stated intent, the confidence score, and the relevant customer state? Are agents trained on the handoff pattern? Does the customer experience a clean transition or a context-loss restart? Fix what the QA log surfaces.

Half the UX

Week 8 · Mon-Wed

CSAT gate review

owner: cross-functional · output: gate decision

Three-week CSAT trend reviewed against baseline. Resolution CSAT, delayed CSAT, and model-scored CSAT all triangulated. If any tier moves more than two points against baseline, traffic stays at 1% and the gating issue is diagnosed before week 9. If trends are neutral or positive, the ramp plan opens.

Gate, do not guess

Week 8 · Thu-Fri

Ramp plan signed off

owner: executive sponsor + support lead · output: ramp gates

Sign-off on the day-61-onward ramp gates. Each ramp step (1% → 3% → 8% → 15% → final ceiling) needs an explicit pre-condition: CSAT trend, false-positive escalation rate, and confidence-threshold stability. No traffic step happens automatically — every gate is a human decision against a dashboard.

Human gates

One pattern worth noting from the pilots we have shipped: week 5 deflection numbers will look disappointing. The model is learning archetype distributions, the confidence thresholds are calibrated conservatively, and the escalation rubric is biased toward handing off. That is the right shape for week 5. By week 8 the deflection number will have climbed substantially — and crucially, it will have climbed at flat or improving CSAT, which is the only deflection number worth reporting.

The escalation handoff deserves disproportionate attention in this phase. A clean handoff means the customer never repeats themselves: the agent picks up a transcript, an intent classification, a confidence score, and any relevant customer state, and continues the conversation from there. A poor handoff is the single largest source of CSAT damage we have seen across deployments — worse, in CSAT terms, than the bot simply being wrong, because the customer experiences wrong-and-frustrating rather than wrong-and-escalated.

04 — Days 61-90Ramp to 10-25%, observability, agent-handoff training.

The third 30 days ramp toward a stable deflection ceiling, wire production observability, and train the broader agent team on the handoff pattern so it scales beyond the pilot squad. By day 90 the deployment should be in a state where it can run unattended for a week without breaking — and where the quarterly model-upgrade cadence has been baked into the operating rhythm.

The ramp is not linear. Each traffic step (1% → 3% → 8% → 15% → 10-25% ceiling) gates on a CSAT-trend check, a false-positive-escalation rate, and a confidence-threshold stability review. Skipping any of those gates is the most common way day-90 deployments produce day-180 rollbacks.

Week 9

Ramp to 3-5%

owner: engineering · gate: CSAT-neutral week-8

Three-to-five-percent traffic, with the week-8 sign-off as the gate. Same daily and weekly review cadence as the pilot — model-scored CSAT daily, resolution CSAT weekly, delayed CSAT at the 72-hour roll-up. If anything drifts, the deployment falls back to 1% within the same shift.

Controlled step

Week 10

Ramp to 8-12%

owner: engineering · gate: CSAT-neutral week-9

Bigger step, but only if every gate holds. This is also the week the broader agent team starts shadowing escalation handoffs — not as a deflection task, but to internalise the conversation shape so they can take ownership of clean handoffs once volume increases.

Volume meets training

Week 11

Observability · production

owner: engineering · output: dashboards + alerts

Production observability layer goes live — confidence-score distributions per archetype, deflection by archetype, CSAT by tier and by archetype, and alerting on any tier moving more than two points against the trailing two-week baseline. Without alerting, CSAT damage is invariably found too late.

Alerts, not dashboards alone

Week 12 · Mon-Wed

Final ramp to 10-25% ceiling

owner: engineering · gate: CSAT-neutral week-11

Final step to the day-90 ceiling. The 10-25% range is a function of archetype mix — high-volume repetitive archetypes (order status, returns, basic billing) support the upper end; mixed B2C distributions support the middle; complex B2B archetypes are usually closer to the lower bound. Land where the data says, not where the original plan promised.

Land where the data says

Week 12 · Thu-Fri

Agent training + quarterly eval cadence

owner: support lead + engineering · output: ops handbook

Full agent team trained on the handoff pattern, the escalation rubric, and the model's archetype boundaries. Quarterly eval cadence — 50-100 archetype prompts with expected outputs — scheduled into the operating rhythm. Day 90 closes with an operations handbook that the support team owns from there forward.

Steady-state handover

The day-90 milestone is not a victory lap — it is a handover. The pilot squad transfers operational ownership to the support team, the eval cadence transfers to a quarterly rhythm, and the broader product backlog (multi-language, voice channels, personalisation) becomes viable to pick up. The day-90 ceiling is rarely the year-one ceiling — most deployments climb to a higher steady-state deflection by month nine — but the difference between a deployment that gets to month nine and one that gets rolled back is almost always the discipline of the first 90 days.

One operational note. The deflection range in week 12 will often land below the original ramp plan. That is normal and is usually a sign the CSAT gates are doing their job — they are holding deflection at the level the archetype mix actually supports, not at the level the slide deck promised. The response is to ship the realistic number, not to relax the gates.

05 — CSAT Gates1%, 5%, 10-25% traffic thresholds.

The ramp gates are the most important operational artifact in the 90-day plan. Each traffic step has an explicit pre-condition, a measurement window, and an automatic-rollback rule. The discipline of the gates is what separates a deployment that scales from one that has to be rolled back in week sixteen.

Gate 01

1% pilot open

Pre-condition: knowledge audit complete, RAG index validated against held-out queries, CSAT instrumentation baseline taken from human-only traffic in the previous fortnight. Measurement window: two weeks. Rollback rule: any CSAT tier dropping more than two points against baseline triggers immediate fall-back to 0% traffic.

Open after week 4

Gate 02

5% expanded traffic

Pre-condition: week-8 CSAT review neutral or positive across all three tiers, false-positive escalation rate inside agreed bound, confidence thresholds stable for at least one week. Measurement window: one week. Rollback rule: same two-point threshold, plus a confidence-distribution shift of more than 10% triggers re-tuning.

Open after week 9 review

Gate 03

10-15% ramp step

Pre-condition: week-9 CSAT review neutral or positive, escalation handoff QA passes the conversation-context check across a sampled batch. Measurement window: one week. Rollback rule: any signal of customer confusion or handoff context loss in the QA sample triggers ramp pause.

Open after week 10 review

Gate 04

10-25% ceiling

Pre-condition: production observability live, alerting wired against the trailing two-week baseline, agent training delivered to the full team. Measurement window: ongoing. Rollback rule: the deployment now lives inside production alerting — any sustained CSAT regression triggers an automatic ramp-down, not a meeting.

Open in week 12

Two implementation notes on the gates. First, every gate decision is a human decision — the dashboards are inputs to the decision, not the decision itself. Automating the gates is tempting and almost always backfires because the right rollback response to a CSAT signal usually involves context the dashboard does not see (a launch that just shipped, a known outage, a documentation change). Keep humans in the loop.

Second, the rollback rule has to be cheaper than the measurement window. If rolling traffic back from 15% to 1% requires a release, a config change, and a war room, teams will avoid the rollback even when they should not. The cleanest implementations we have shipped use a single feature-flag value that any on-call engineer can toggle — under thirty seconds from CSAT-alert to traffic-back-at-baseline.

The non-negotiable

Every ramp gate is a CSAT gate first, a deflection gate second. A deployment that ramps on deflection numbers while CSAT drifts is buying short-term metrics at the cost of long-term retention. The discipline of holding the gates is what separates the deployments that scale from the ones that quietly come back down.

06 — TemplatesKnowledge audit, escalation rubric, handoff script.

Three templates ship with every engagement we run. The knowledge audit is the upstream artifact that sets the deflection ceiling; the escalation rubric is the per-archetype confidence and routing spec; the handoff script is the customer-facing language and the agent-facing context payload. Each is short by design — the value is in the discipline of having them at all, not in the length of the document.

What follows is the working shape of each template. Adapt to your own ticket archetypes and operational vocabulary; the structure is the load-bearing part.

# Knowledge audit — row shape
# (one row per help-centre article / SOP / macro)

doc_id              : KB-0421
title               : "Refund policy — standard order"
owner               : support-ops@example.com
last_updated        : 2025-11-14
product_match       : stale          # current | stale | unknown
maps_to_archetype   : "refund.standard"
action              : update_by_day_30
notes               : "Refund window changed Q4 2025; doc still shows 14d."

# Escalation rubric — per-archetype block

archetype           : "billing.dispute"
confidence_floor    : 0.78
never_deflect       : true           # always escalates regardless of confidence
context_payload     :
  - conversation_transcript
  - detected_intent
  - confidence_score
  - account_state.last_invoice
  - account_state.tier
routing_queue       : tier-2-billing
csat_alert_band     : "+/- 2 points vs baseline"

# Handoff script — agent-facing context, customer-facing language

agent_context_block : |
  AI summary: customer is asking about a duplicate charge on invoice
  INV-8821 (2026-04-22). AI confidence: 0.42 (below floor).
  Account state: paid tier, no prior disputes, last invoice $148.

customer_language   : |
  "I'm connecting you with a specialist who has the full context of
  what we've discussed — they'll pick up right where we left off."

post_handoff_check  : agent confirms the AI summary inside the first
                      30 seconds; flag any context loss to QA queue.

A few notes on adapting the templates. The knowledge audit row shape is intentionally narrow — five columns, plus owner and notes, is the minimum that produces a useful work backlog. We have seen teams expand it into a thirty-column tracker that never gets maintained; the narrow shape gets maintained, which is the whole point.

The escalation rubric is the document with the highest operational consequence. Every never-deflect category — refunds above a threshold, churn-save conversations, security or account-compromise reports, anything regulated — has to be written down with the sign-off of the support lead and (where relevant) the legal lead. The cost of an AI deflection on the wrong archetype is much higher than the value of one extra deflected ticket.

The handoff script matters because customers experience the handoff as a continuity check, not as a feature. If the agent can confirm the AI summary inside the first thirty seconds of the live conversation, the customer experiences continuity. If the agent has to ask "what is this about", the customer experiences context loss — which, in our deployments, is the single largest source of CSAT damage. Train for the thirty-second confirmation.

"The handoff script is a continuity test the customer runs in the first thirty seconds. Pass it or lose them."— Field note · 2026 client engagements

07 — PitfallsFour support-launch failure modes.

Almost every failed support-AI launch we have seen falls into one of four categories. Knowing the patterns is the cheapest insurance against them — most of these are obvious in retrospect and missable in the moment. The stress-tests below are designed to surface each failure mode before it ships, not after.

Pitfall 01

RAGon rot

RAG grounded against stale docs

The model is RAG-grounded against documentation that goes stale faster than it gets updated. Deflection quality degrades as the gap between docs and product widens. Mitigation is a continuous-content-curation workflow, owned by the support team, with the knowledge audit re-run quarterly.

Owner: support ops

Pitfall 02

Ramptoo fast

1% → 25% in three weeks

Teams that ramp from pilot to production faster than the CSAT instrumentation can detect damage consistently discover the damage at the quarterly review, by which point the traffic slice is too large to roll back without customer impact. Hold the gates.

Owner: ramp gates

Pitfall 03

Losthandoff

Escalation context loss

Agent receives a handoff with no transcript, no intent classification, no confidence score, and no customer state. Customer repeats themselves. CSAT damage shows up at the delayed-survey window. Mitigation is the handoff script, agent training, and a post-handoff context confirmation in the first 30 seconds.

Owner: handoff script

Pitfall 04

CSATas lag

CSAT measured quarterly, not gated

Treating CSAT as a downstream metric measured quarterly rather than as a gating constraint on deflection. Deployments that find CSAT damage at the quarter boundary discover it too late to roll back deflection targets cleanly. Mitigation is the three-layer CSAT instrumentation wired before the pilot opens.

Owner: instrumentation layer

Two further patterns are worth flagging even though they sit outside the headline four. The first is vendor lock-in by ticket schema — the AI vendor's ticket schema becomes the de facto schema for your support data, and switching vendors later requires a painful migration. Mitigation is to keep the canonical ticket schema in your own data layer (warehouse, lake, primary database) and treat the vendor schema as a downstream view.

The second is model upgrade inertia — the deployment lands on a frontier model in month one and stays on it for eighteen months while the broader model landscape moves twice. Mitigation is the quarterly eval cadence described in the day-90 milestone — 50-100 archetype prompts with expected outputs, rerun against the current frontier each quarter. If a new model wins on your evals, plan the migration; if not, the eval still gives you defensible evidence to stay put.

For teams designing the underlying business case before committing to a 90-day launch, our companion piece on support ROI and the deflection formula covers the math underneath the model — deflection ranges by tier, cost-per-ticket ladders, break-even thresholds, and vendor comparison. And for the anti-patterns to actively avoid during rollout, our piece on support anti-patterns and deflection mistakes walks through the deployments that have gone wrong and the common shape of the failure.

For teams that want the 90-day plan delivered as a managed engagement rather than run internally, our AI transformation engagements ship the knowledge audit, the RAG build, the CSAT instrumentation layer, the pilot, and the agent training as a phased program — calibrated to your ticket archetype mix and with measurable CSAT-controlled outcomes.

Conclusion

Support launches succeed when CSAT precedes deflection — 90 days is the right horizon.

The 90-day plan in this piece is not the only way to launch AI customer support, but it is the way we have seen produce the most durable outcomes across client engagements. Days 1-30 do the knowledge work upstream of the model — the audit, the intent catalog, the RAG build, the CSAT instrumentation. Days 31-60 launch a 1% deflection pilot with explicit gates and a designed handoff. Days 61-90 ramp toward a stable ceiling, wire production observability, and hand over to a support team equipped with a quarterly eval cadence.

The pattern across every successful launch is the same: CSAT precedes deflection. Teams that wire the instrumentation before the model touches customer traffic ship deflection numbers that hold. Teams that chase deflection first and measure CSAT quarterly discover damage too late to recover. Ninety days is enough time to do the upstream work well, and short enough to keep the executive sponsor's focus. Anything faster usually skips a phase; anything slower usually loses momentum.

The honest framing is that the day-90 milestone is a handover, not a finish line. The deflection ceiling at month twelve is typically meaningfully higher than the ceiling at day 90, but the difference between a deployment that gets to month twelve and one that gets rolled back is almost always the discipline of the first ninety days. Build the foundation; the rest of the curve takes care of itself.

AI Customer Support Launch: 30/60/90-Day Plan 2026