An ecommerce recommendation engine is the system that decides which products to show a shopper next — on the product page, in the cart, in email, and in search — and the better engines can drive a meaningful share of total site revenue. The harder question in 2026 is not whether to run one, but whether to build it or buy it, and how to tell whether it is earning its keep at all.
The category is crowded with confident numbers. Recommendation widgets reportedly account for a small slice of clicks but a disproportionate slice of revenue; specialist platforms publish conversion and average-order-value gains that look enormous. Almost all of those figures are attributed — they describe sessions that happened to click a recommendation — and attribution is not the same as causal lift. The merchant who never withholds recommendations from a holdout group genuinely does not know what the engine is worth.
This guide does three things most coverage skips. It separates attributed revenue from incremental lift and shows the test that closes the gap. It maps the algorithm families to their cold-start failure modes and the platforms that implement each. And it lays out a build-vs-buy decision matrix across five merchant tiers, including the variable most comparisons ignore: who owns the training data.
- 01Recommendations move real revenue — but the size is misread.Barilliance's 2023 study put recommendation contribution at up to 31% of ecommerce site revenue, and Salesforce-cited data attributes around 26% of revenue to roughly 7% of traffic. These are attributed figures, not holdout-tested incremental lift.
- 02Attribution overstates lift, often dramatically.Vendor stats like '369% higher AOV for rec-engaged sessions' describe a behavioural difference, not a causal increase. Shoppers who engage with recommendations were already higher-intent. Only a randomized holdout reveals true incremental revenue.
- 03Buy starts cheap; build starts expensive.SaaS entry pricing begins near $25/month (Rebuy à la carte) and scales to a reported $50K+/year at the enterprise tier (Bloomreach). A custom build is a directional $70K–$400K+ upfront plus 10–15% annually for maintenance and retraining.
- 04Algorithm choice is really a cold-start choice.Collaborative filtering fails on new users and new items; content-based and vector-embedding approaches handle new items; session-based transformers handle anonymous, early-funnel traffic. Match your catalog churn and data maturity to the family, not the brand name.
- 05Data ownership is the hidden decision variable.Most SaaS platforms retain the behavioural training data. Brands with multi-channel data strategies or strict privacy constraints may prefer managed ML (Amazon Personalize) or a custom store that keeps training data in-house.
01 — Why It MattersA small slice of clicks, a large slice of revenue.
The case for recommendations rests on a recurring shape in the data: recommendation surfaces touch a minority of sessions but punch well above their weight on revenue. Salesforce Research data cited by Bloomreach reports that recommendation clicks account for roughly 7% of site traffic but around 26% of ecommerce revenue, a ratio independently echoed across several vendor datasets. Barilliance's 2023 study put the ceiling higher still, attributing up to 31% of total site revenue to recommendation engines.
The headline anchors are familiar. Amazon is widely cited as generating around a third of its revenue from recommendations — a McKinsey-attributed benchmark that has been republished so often the original report is rarely linked directly, so treat it as an industry benchmark rather than a current audited figure. Netflix's oft-quoted "75% of viewing from recommendations" traces to a 2015/16 estimate; it is historical context for how central discovery became, not a 2026 statistic.
Broader personalization research points the same direction without the inflation. McKinsey's frequently-cited finding is a 5–15% revenue lift from personalization for most companies, with faster-growing firms extracting more value from it than slower competitors. That band — single to low-double digits — is a far more defensible planning number than any vendor case study, and it is the one we anchor budgets to.
Of clicks, ~26% of revenue
Salesforce Research, cited via Bloomreach: recommendation clicks are roughly 7% of site traffic yet generate around 26% of ecommerce revenue. A disproportionate, but attributed, contribution.
Upper-bound contribution
Barilliance's 2023 study attributes up to 31% of total ecommerce site revenue to recommendation engines. Use as a ceiling, not a baseline — your real number depends on placement and traffic mix.
The defensible band
McKinsey's widely-cited estimate of revenue lift from personalization. Single-to-low-double-digit is the honest planning anchor — well below the eye-catching attributed figures vendors lead with.
Demand-side signals reinforce the trend. Survey data points to most consumers expecting personalized interactions and a majority growing frustrated when it is absent, and reporting suggests the share of marketing budget going to personalization has climbed sharply since 2023, with most brands planning to spend more in 2026. Our reading: the question for mid-market merchants has shifted from whether to personalize to how much to spend and on which path — which is exactly the build-vs-buy problem the rest of this guide solves.
02 — The Attribution GapAttributed revenue is not incremental lift.
Here is the story almost no vendor tells. When a platform reports that recommendations drove a given share of revenue, it is counting sessions in which a shopper clicked a recommendation and later purchased. That is attribution. It does not ask the only question that matters for ROI: would that shopper have bought anyway, without the recommendation in front of them?
The two diverge because the people who engage with recommendations are not a random sample. They are, on average, already further down the funnel — more engaged, higher intent, more likely to convert no matter what the page showed them. Crediting the recommendation with their entire purchase confuses correlation with cause. The widely quoted statistic that rec-engaged sessions show dramatically higher average order value is a textbook example: it is a real difference between two groups of shoppers, but it is a difference in who engages, not proof of how much the engine added.
The fix is borrowed straight from paid media: a randomized holdout. Withhold recommendations from a randomly assigned share of traffic, keep serving them to everyone else, and measure the revenue difference between the two groups. That delta — not the attributed total — is the incremental lift, and it is the only figure worth putting in a business case. The discipline is identical to the one we lay out for media in our guide to incrementality testing and causal lift; the mechanics translate cleanly to on-site recommendations.
"Finding patterns in user behavior and suggesting products that similar users have liked"— Bloomreach, defining collaborative filtering
None of this means the engines do not work — the McKinsey-cited 5–15% personalization band is real, and a well-placed recommendation does change behaviour. It means the size of the win is almost always overstated in the materials you will be sold on, and that the merchant who measures it honestly can negotiate better, allocate budget more accurately, and avoid over-investing in a capability whose true contribution they have never tested.
03 — Algorithm FamiliesMatch the algorithm to your cold-start problem.
Most guides list "collaborative filtering, content-based, hybrid" in one breath and move on. The decision that actually matters is which family handles your particular cold-start problem — the situation where the engine has no behavioural history to lean on, either because the user is new or anonymous, or because the item was just added to the catalog. Catalog churn and traffic anonymity, not brand preference, should drive the choice.
Collaborative filtering learns from co-behaviour — surfacing what similar users liked — and is powerful once data is dense, but it fails hard on brand-new users and brand-new items it has never seen interact. Content-based methods recommend on item attributes, so they handle new items but can feel narrow. Session-based sequential models (SASRec, BERT4Rec, NVIDIA's Transformers4Rec) shine for short, anonymous, early-funnel sessions; Transformers4Rec notably won two major ecommerce recommendation competitions. Vector-embedding similarity uses cosine distance in a high-dimensional space to surface semantically related products, which is especially valuable for cold-start new items.
| Algorithm family | Data needed | New-user cold start | New-item cold start | Best placement |
|---|---|---|---|---|
| Collaborative filtering | High — dense user-item interaction history | Weak — no history to match against | Weak — needs interactions to place an item | Cart, "customers also bought" |
| Content-based | Low — clean product attributes / metadata | Moderate — can use a single viewed item | Strong — attributes known at upload | Product page, "similar items" |
| Session-based / sequential | Moderate — in-session click streams | Strong — works on anonymous sessions | Moderate — depends on item features | Early-funnel discovery, homepage |
| LLM-augmented hybrid | Variable — CF signal + enriched features | Strong — research shows cold-start gains | Strong — semantic understanding of items | Search, conversational discovery |
| Vector-embedding similarity | Low — embeddings from text/image/attributes | Moderate — needs a seed item or query | Strong — embeds new items immediately | Visual / semantic "more like this" |
The frontier is hybrid. Research from 2024 found that LLM-augmented systems outperform pure collaborative filtering in cold-start scenarios but can underperform traditional collaborative filtering on warm, data-rich user-item pairs — which is why the research-backed best practice is a hybrid that uses an LLM for feature enrichment and collaborative filtering for the behavioural signal. Verify that this direction still holds in current literature before betting an architecture on it; it is a fast-moving research area. For most merchants the practical takeaway is simpler: you do not need one algorithm, you need the right one in each placement.
04 — The Buy PathSaaS: fast to deploy, pricing scales with scale.
The buy path is a spectrum, not a single product. At the entry end sit app-store tools that install on a hosted platform in hours; at the top end sit enterprise experience platforms with custom-quoted contracts. Pricing below is vendor-stated and subject to change — confirm current terms directly before committing.
Rebuy anchors the accessible end of the market: a vendor-stated 50,000+ Shopify brands and billions in attributedrevenue (attributed, again — not incremental). Its "Build Your Own" plan starts around $25/monthbilled à la carte by order volume, with an all-inclusive "Platform One" tier from roughly $534/month. Nosto serves a vendor-stated 1,500+ brands and quotes custom pricing based on GMV, traffic, and modules — it does not publish standard tiers, so any specific Nosto price you see elsewhere should be treated with suspicion. Bloomreach sits at the enterprise tier, with pricing reported in the range of $50K+/year — a directional figure, not a published rate card.
Rebuy & app-store tools
Installs on hosted platforms in hours. Rebuy reports 50,000+ Shopify brands and billions in attributed revenue. Best for growth-stage merchants who want recommendations live this week without an engineering lift.
Nosto & Bloomreach
Nosto serves 1,500+ brands with GMV-based custom pricing and multiple AI types. Bloomreach targets enterprise. Deeper personalization, more placements, heavier contracts — and the platform usually retains the training data.
Amazon Personalize
AWS managed recommendation service: usage-based pricing for data ingestion, training, and inference, with a two-month free tier. Between buy and build — your data, your AWS account, no full custom team.
A note on platform-native engines. Hosted platforms like Shopify ship a free, rule-based-plus-basic-ML recommendation engine baked in. It is genuinely fine for a starting point. But third-party analysis — not a Shopify-confirmed controlled study — suggests native recommendations underperform specialist apps by roughly 12–18% on conversion optimization, with the gap mattering most for stores that have significant revenue tied to recommendations. Read that as a directional signal to test a specialist app against native, not as a guaranteed delta. For stores where recommendations are a rounding error, native is the right call; for stores where they are a revenue pillar, the comparison is worth running.
05 — The Build PathCustom: control and ownership, at a real cost.
Building a custom recommendation engine buys you two things money cannot otherwise: full control over the algorithm and full ownership of the training data. It also costs real money and, more often, real time. Aggregated consultancy estimates put initial development at a directional $70,000 to $400,000+, with enterprise implementations reaching higher, plus an ongoing 10–15% annually for maintenance and model retraining. These are not binding quotes — they are ranges compiled from multiple vendor and consultancy sources, and your number depends heavily on scope.
The cost that surprises teams is not the modelling. A widely-cited rule of thumb in machine learning is that around 80% of project time goes to data preparation — cleaning, joining, and pipelining behavioural and catalog data — rather than to building the model itself. A recommendation engine is only as good as the data flowing into it, and most ecommerce data is messy, multi-source, and partially missing. Budget for the plumbing, not the algorithm.
Cost of ownership · buy vs build (directional)
Sources: Rebuy & Bloomreach (vendor-stated / reported); custom build from aggregated consultancy estimates — all directionalThe build vs buy choice, then, is rarely about whether you can build — it is about whether the marginal control and data ownership justify a six-figure commitment and a multi-month timeline against a SaaS tool that is live this week. For most merchants below the enterprise tier, the honest answer is no. The exceptions are specific and worth naming, which is what the matrix in the next section does.
06 — The Decision MatrixBuild vs buy across five merchant tiers.
The table below is our consolidated decision matrix, mapped to GMV tiers and the two profiles — data-first and privacy-constrained — that override the simple revenue logic. Most published comparisons are either vendor-biased or generic developer guides; the value here is one view a CTO or CFO can use at a whiteboard. Costs are directional, drawn from the vendor-stated and aggregated estimates cited throughout this guide.
| Merchant profile | Recommended path | Time to first rec | Year-one cost (directional) | Data ownership |
|---|---|---|---|---|
| Startup · <$1M GMV | Platform-native engine | Hours — already built in | $0 (included in platform) | Platform-held |
| Growth · $1M–$10M GMV | Specialist SaaS app (e.g. Rebuy) | Days — app install + config | ~$300–$7K/yr (vendor-stated) | Vendor-held |
| Mid-market · $10M–$50M GMV | Enterprise SaaS or managed ML | Weeks — integration + tuning | Custom quote / usage-based | Vendor-held (or your AWS) |
| Enterprise · $50M+ GMV | Managed ML or custom build | Months — full data pipeline | $50K+ SaaS up to $70K–$400K+ build | In-house (if built) |
| Data-first / privacy-constrained | Managed ML or custom (data-owning) | Weeks to months | Usage-based to six figures | In-house — the deciding factor |
Read the matrix as a default, then adjust for the two variables that override GMV: how much revenue is genuinely tied to recommendations, and whether you have data-ownership or privacy constraints. A $5M-GMV brand with a fast-churning catalog and a strict first-party data posture can rationally jump straight to managed ML; a $40M brand whose recommendations are a minor surface can stay on a specialist app indefinitely.
07 — The Hidden VariableWho owns the training data?
The variable most build-vs-buy comparisons ignore is data ownership. When you buy a SaaS recommendation engine, the behavioural data that trains the model typically lives with the vendor. For many merchants that is a fair trade — the vendor does the hard machine-learning work and you get recommendations without a data-science team. But for brands pursuing a unified, multi-channel first-party data strategy, or operating under strict privacy obligations, handing the richest signal you own to a third party is a strategic cost that rarely shows up on the price comparison.
This is where managed ML and custom builds change the calculus. Building on Amazon Personalize keeps training data inside your own cloud account; a custom vector store or model keeps it entirely in-house. For privacy-constrained brands — those with processor obligations under regimes like GDPR, or those that simply refuse to seed a competitor-adjacent platform with their behavioural data — owning the training set can be the deciding factor, independent of cost or convenience. It is the same first-party-data logic that drives our broader ecommerce growth engagements and our work on CRM and customer-data automation.
Get recommendations live this week
Growth-stage, recommendations are a useful-but-not-central surface, and you have no appetite for an engineering project. Install a specialist SaaS app, accept vendor-held data, and move on. The McKinsey-cited 5–15% personalization band is achievable here.
Keep training data in your account
You run a multi-channel first-party data strategy or have privacy obligations, but you don't want a full custom team. Managed ML (Amazon Personalize) keeps data in your cloud with usage-based pricing — the underrated middle path.
Recommendations are a revenue pillar
Enterprise GMV, a differentiated catalog, and revenue materially tied to discovery. A custom build — directional $70K–$400K+ plus 10–15%/yr — buys full algorithm control and data ownership. Budget 80% of it for data plumbing.
Sub-$1M GMV, thin data
Your platform's native engine is free and good enough until recommendations demonstrably move revenue. Don't pay for a specialist app — or build anything — until a holdout test shows native is leaving money on the table.
08 — How To MeasureThe one test that tells you the truth.
Whichever path you choose, the discipline is the same: do not accept the platform's attributed revenue as your ROI. Build the business case on incremental lift, and measure it with a holdout. The steps are simple and the cost is mostly forgone attributed credit, not cash.
- Randomize a holdout. Assign a fixed share of traffic — commonly 5–20% — to a control group that sees no recommendations, with the rest as the treatment group. Randomize at the user or session level, consistently.
- Hold the test long enough. Run across full purchase cycles so the comparison is not distorted by a promotion, a launch, or a seasonal spike hitting one group more than the other.
- Measure the delta, not the total. Compare revenue-per-user between treatment and control. That difference is the incremental lift — typically well below the attributed figure the platform reports.
- Re-test after major changes. A catalog overhaul, a new placement, or a platform switch can move the real number. Treat the holdout as a recurring instrument, not a one-time audit.
This is also where strategy meets execution. Picking the right path, standing up a holdout, and reading the result honestly is the kind of work our AI-powered product personalization engagements are built around — and it connects to adjacent surfaces like personalized recommendation emails and the agentic commerce protocol feeding the next wave of AI shopping agents. The engine is only half the problem; knowing what it is worth is the other half.
09 — ConclusionPick the path, then prove the lift.
Build vs buy is a smaller question than build vs prove.
Recommendation engines earn their place in ecommerce — the McKinsey-cited 5–15% personalization band is real, and a good engine changes shopper behaviour. But the eye-catching numbers that sell them are attributed, not incremental, and the gap between the two is where most merchants quietly overpay. The first discipline is not picking a vendor; it is refusing to confuse a dashboard with a result.
On the build-vs-buy axis itself, the defaults are clear. Below roughly $1M GMV, the platform-native engine is enough. Through the growth and mid-market tiers, a specialist SaaS app or managed ML wins on speed and total cost — a custom build's directional $70K–$400K+ plus ongoing maintenance only pays off when recommendations are a genuine revenue pillar or when owning the training data is itself the requirement. Match the algorithm family to your cold-start reality, not to the brand on the box.
The forward signal is that this decision is converging on a barbell: cheap, capable native and app-store tools at one end, and data-owning managed ML or custom builds at the other, with the squeezed middle increasingly hard to justify. Either way, the merchant who runs a holdout knows something their competitors do not — what the engine is actually worth. That number, not the vendor's, is the one to build the next year's plan on.