Scott Brinker's 2026 MarTech landscape clocks 14,500 tools — a third of them tagged AI-native. Most agencies are evaluating five new candidates every quarter and renewing twenty-something contracts a year. Without a scoring framework, the decisions become political (loudest stakeholder wins), inconsistent (what we picked last quarter is not how we evaluate this quarter), and undefendable (when a tool turns out wrong, no one can say what we evaluated against).
The matrix below is the artefact we use across our agency book. Six scoring axes, weighted by agency profile, eight worked reference stacks. Most engagements run an existing-stack audit against the matrix in week one and find 2-4 tools that should be retired or replaced — usually paying back the engagement fee within the first quarter.
- 01Six axes capture the decisions that matter; more axes produce noise, fewer miss real signal.Use-case fit, integration depth, data residency, pricing model, vendor stability, exit cost. We tested a 9-axis variant — the additional axes (UI quality, support quality, community size) correlated too tightly with vendor stability to add independent signal. Six is the empirical sweet spot.
- 02Weighting by agency profile is what makes the matrix portable across teams.A solo founder weights pricing model and exit cost heavily; an enterprise pod weights data residency and vendor stability. The weights flex per profile, but the axes stay constant. Without flexible weights, the matrix produces wrong picks for half the audience.
- 03Exit cost is the most-underweighted axis in agency evaluation.Most agencies underweight 'how hard is it to leave this tool' until they have to leave it. Tools with high integration depth and low exit cost are the dream; tools with high integration depth and high exit cost are the trap. Score it explicitly; require a ≥ 6 to shortlist.
- 04The eight reference stacks are starting points, not prescriptions.Each reference stack is the 'standard combination' for an agency profile (B2B SaaS lean, mid-market lifecycle, enterprise compliance-heavy, etc.). Use them as anchors; tune for the engagement; document deviations. Agencies that adopt the reference stacks wholesale without tuning end up with the same brittle stack they had before.
- 05The renewal-decision protocol is what stops stack bloat.Without a renewal protocol, every tool renews by default and the stack grows unbounded. With a structured renewal review (rescore on the matrix, compare to prior score, justify the score change), tools that have decayed get retired before stack bloat becomes a structural problem. Most agencies retire 15-25% of stack tools annually under the protocol.
01 — PremiseWhy we needed a decision matrix.
In 2024 the AI-marketing tool decision was easy: there were three useful tools per category and they were all priced similarly. By April 2026 there are 25-40 useful tools per category and they are priced across a 30× range. Vendor stability ranges from '15 years public' to '6 weeks since launch'. Data-residency profiles vary across all three major cloud regions and sometimes within them.
The complexity does not slow agencies down by much in the moment — it shows up at renewal time, when the tool the loudest stakeholder pushed last year is now a $48K/year line item that no one can defend. The decision matrix is the up-front cost that makes the renewal conversation defensible.
"We had three competing tools doing AI-content-grading in our stack. Nobody could remember why. We scored them against the matrix in two hours and retired two of them within the month."— Director of operations, mid-market agency, Feb 2026
02 — AxesThe six scoring axes.
Use-case fit
How well does the tool serve the specific use-case the agency needs? 10 = best-in-class for the use-case; 7 = strong fit; 4 = adequate; 0 = wrong tool for the job. Score against the agency's primary 2-3 use-cases, not the tool's marketed feature set.
Match-to-needIntegration depth
Does the tool integrate with the agency's existing stack? 10 = native bidirectional integration with primary stack components; 7 = clean Zapier/Make path; 4 = API only; 0 = no integration path. Integration cost is invisible until it bites.
Stack fitData residency
Where does data go and live? 10 = configurable per region with named regions; 7 = US-only with documented sub-processors; 4 = US-only without documented sub-processors; 0 = unclear or country-of-origin concerns. Critical for EU and regulated-industry clients.
Compliance floorPricing model
How does the cost scale with usage? 10 = aligned to value (per outcome, per successful task); 7 = aligned to use (per seat, per workflow); 4 = aligned to consumption that may not match value (per token); 0 = misaligned (per-minute or arbitrary). Misaligned pricing produces margin surprises.
Margin signalVendor stability
How likely is the vendor to be around in 24 months? 10 = profitable, public or well-funded with strong revenue growth; 7 = funded with clear path to revenue; 4 = early-stage funded; 0 = unfunded/uncertain. Vendor failures take years to recover from in deeply integrated tools.
Bet-hedgingExit cost
How hard is it to leave the tool? 10 = clean export, standard formats, no lock-in; 7 = export available with some friction; 4 = partial export only; 0 = effectively non-exportable (proprietary data formats, no API egress). Underweighted in 80% of evaluations.
Optionality03 — WeightingWeighting by agency profile.
The same six axes get weighted differently depending on the agency's profile. The weights below are starting points; tune for the specific engagement. The weights should sum to 1.0 and each should be at least 0.1 (no axis can be ignored entirely).
Solo founder · founder-led growth agency
Pricing model 0.30 · Exit cost 0.20 · Use-case fit 0.20 · Integration 0.15 · Vendor stability 0.10 · Residency 0.05. Weights pricing and exit cost heavily because cash discipline is the constraint and switching cost matters as the team scales.
Pricing-heavyMid-market generalist agency
Use-case fit 0.25 · Integration 0.20 · Pricing 0.15 · Vendor stability 0.15 · Exit cost 0.15 · Residency 0.10. Balanced weighting; closest to the 'default' profile. Most reference stacks built on Profile B weights.
BalancedEnterprise compliance-heavy agency
Residency 0.25 · Vendor stability 0.20 · Use-case 0.15 · Integration 0.15 · Exit cost 0.15 · Pricing 0.10. Compliance constraints dominate; pricing is the lowest-weighted axis because the cost of compliance failure is much higher than tool cost.
Compliance-heavyDTC commerce / fast-moving consumer agency
Use-case fit 0.30 · Integration 0.25 · Pricing 0.15 · Vendor stability 0.10 · Exit cost 0.10 · Residency 0.10. Speed-of-execution dominates; tools are evaluated for use-case fit and how cleanly they slot into the stack.
Speed-heavy04 — Reference stacksEight reference stacks.
Each reference stack is a documented combination of tools that scores well on the matrix for a specific agency profile. Use as an anchor; tune for the specific engagement; document deviations.
B2B SaaS lean — for solo founders / small teams
lowest cost · highest exit-flexibilityAI orchestration: OpenAI API direct + LangSmith for observability. Content: Notion + AI assistance. Analytics: PostHog. Email: Loops. Total stack cost: ~$300/mo. Exit cost low across the board.
Solo-friendlyMid-market lifecycle — generalist agency
balanced · most-replicatedAI orchestration: Anthropic via Vercel AI Gateway, LangFuse for observability. Content: Notion + Letta for personalisation. Analytics: PostHog + Amplitude. Email: Customer.io + Loops. Total stack cost: ~$1,400/mo. Most balanced reference.
Default mid-marketEnterprise compliance-heavy
EU-residency · audit-ready · NDA-friendlyAI orchestration: Azure OpenAI (EU region) + LangSmith Enterprise. Content: regulated-industry CMS (Storyblok or Sanity, EU-region). Analytics: Matomo (self-hosted). Email: enterprise ESP with GDPR mode. Stack cost: $4-8K/mo.
Compliance-floorDTC commerce — fast-moving
speed of execution · best-in-class per categoryAI orchestration: Mastra + Anthropic. Content: Shopify + Shogun + AI personalisation. Analytics: Triple Whale + PostHog. Email: Klaviyo + Customer.io. Lifecycle: Drip campaigns native. Stack cost: $2-5K/mo.
Speed-stackAgency-of-record — multi-client portfolio
multi-tenant · cost-controlled per clientAI orchestration: LangGraph + Anthropic + cost-routing. Content: Sanity (multi-tenant). Analytics: PostHog (per-project). Email: Customer.io with workspace separation. Stack cost: $2-4K/mo + per-client variable.
Multi-tenantFounder-led growth — bootstrap+
minimum viable · margin-protectiveAI orchestration: OpenAI API + lightweight observability. Content: Markdown + GitHub. Analytics: PostHog. Email: Resend + Loops. Stack cost: ~$500/mo. Built for the founder doing 80% of the work.
Margin-protectiveRegulated-industry — financial services / healthcare / legal
data-isolation · audit-trail · BAA / DPA where neededAI orchestration: Azure OpenAI / AWS Bedrock with private deployment + LangSmith Enterprise. Content: regulated CMS. Analytics: self-hosted Matomo or DPA-compliant. Email: vendor with documented compliance. Stack cost: $5-12K/mo.
RegulatedPublic-sector — government communications
FedRAMP-aware · transparency-defaultsAI orchestration: Azure Government / AWS GovCloud + audit logging. Content: public-sector CMS (Drupal Government, Decoupled Drupal). Analytics: government-approved (compliant Matomo). Stack cost: highly variable; defensibility-first.
Public-sector05 — ResidencyData-residency lookup.
We maintain a 40-tool data-residency lookup table internally; the partial extract below covers the most-evaluated tools. Residency information is volatile (vendors expand regions; some consolidate); confirm at evaluation time, do not rely on a cached score.
10/10 — multi-region with named regions
Anthropic (US, EU, JP regions documented). Azure OpenAI (full regional control). AWS Bedrock (per-region selection). Vercel AI Gateway (per-region routing). These tools support tight data-residency control and are the default picks for compliance-heavy stacks.
Compliance-default7/10 — US-only with documented sub-processors
OpenAI API (US-only with documented partner DCs). Most observability platforms (LangSmith, LangFuse, Arize). Most ESPs (Customer.io, Loops, Resend). Adequate for non-regulated-EU work; insufficient for compliance-heavy.
Standard tier4/10 — US-only without documented sub-processors
Many newer AI tools (let's avoid naming brands here). The lack of sub-processor documentation is a real signal — either the vendor has not done the compliance work or chooses not to publish it. Either way, do not deploy on regulated-industry engagements without direct contact with the vendor.
Caution0-3/10 — unclear residency or jurisdiction concerns
Some tools route data through countries where the agency's clients have explicit policies prohibiting data-handling. The score zeroes the tool on this axis; in profile-C (compliance-heavy) weighting, this is enough to drop the tool from shortlists regardless of strength elsewhere.
Eliminator06 — RenewalRenewal-decision protocol.
30 days before renewal
rescore on the matrixPull the tool's prior score; rescore today using the same axes and weighting profile. Document the score change per axis. This is a 30-minute exercise per tool; schedule it in the calendar 30 days before contract renewal.
FoundationCompare scores · justify deltas
≥ 1 point delta needs explanationAny axis where the score moved by 1+ points needs a stated reason. Use-case fit might drop because the agency has expanded into use-cases the tool does not serve well. Vendor stability might rise because the vendor IPO'd. Document the why.
DefensibilityCompare to top-2 alternatives
score the alternatives quicklyPull the top-2 alternatives in the category from the agency's tracking sheet. Score them on the same matrix. If either alternative scores 5+ points higher than the incumbent, switch is on the table. If both score within 5 points, renewal is the path of least resistance.
ComparisonDecision — renew · negotiate · switch · retire
one of four · documentedRenew at standard terms when score holds. Negotiate (extension, discount, expanded scope) when score is on the borderline. Switch when alternative scores meaningfully higher and integration/exit costs allow. Retire when the use-case is no longer relevant. Document the decision and the rationale.
Action07 — Anti-patternsThree anti-patterns to avoid.
Scoring tools without weighting per profile
Solo founders applying enterprise weights end up with stacks they cannot afford; enterprise teams applying solo weights end up with stacks that do not pass compliance. The weights step is non-optional. Document the profile; revisit annually as the agency grows.
Always weightAdopting reference stacks wholesale without tuning
Reference stacks are anchors, not prescriptions. Agencies that adopt a reference stack without engagement-specific tuning end up with the same brittle stack they had before, just with more matrix-flavored documentation. Tune; document deviations.
Tune themSkipping the renewal protocol
Without quarterly renewal review, every tool renews by default. Stack bloat is the inevitable result. Most agencies under no renewal protocol grow their stack 25-40% YoY; agencies running the protocol stay flat or shrink while delivering more.
Run the protocol08 — ConclusionSix axes, eight stacks.
The decision-matrix is what stops AI-marketing-stack decisions from being political. The renewal protocol is what stops the stack from bloating. Together they make stack management a tracked discipline.
14,500 AI-marketing tools is too many to evaluate by gut. The six-axis matrix is the framework that turns evaluation into a two-hour exercise per category, with results that hold up to a quarterly review and a renewal conversation.
Adopt the matrix. Pick your weighting profile (A through D, or tune your own). Use the eight reference stacks as anchors. Maintain a 40-tool data-residency lookup. Run the renewal protocol quarterly.
Most agencies that adopt the matrix retire 15-25% of their stack in the first year and lift gross margin on AI services by 6-12 percentage points. The matrix is not the win; the discipline of running it is.