An ecommerce recommendation engine is the system that decides which products to show a shopper next — on the product page, in the cart, in email, and in search — and the better engines can drive a meaningful share of total site revenue. The harder question in 2026 is not whether to run one, but whether to build it or buy it, and how to tell whether it is earning its keep at all.

The category is crowded with confident numbers. Recommendation widgets reportedly account for a small slice of clicks but a disproportionate slice of revenue; specialist platforms publish conversion and average-order-value gains that look enormous. Almost all of those figures are attributed — they describe sessions that happened to click a recommendation — and attribution is not the same as causal lift. The merchant who never withholds recommendations from a holdout group genuinely does not know what the engine is worth.

This guide does three things most coverage skips. It separates attributed revenue from incremental lift and shows the test that closes the gap. It maps the algorithm families to their cold-start failure modes and the platforms that implement each. And it lays out a build-vs-buy decision matrix across five merchant tiers, including the variable most comparisons ignore: who owns the training data.

Key takeaways

01
Recommendations move real revenue — but the size is misread.Barilliance's 2023 study put recommendation contribution at up to 31% of ecommerce site revenue, and Salesforce-cited data attributes around 26% of revenue to roughly 7% of traffic. These are attributed figures, not holdout-tested incremental lift.
02
Attribution overstates lift, often dramatically.Vendor stats like '369% higher AOV for rec-engaged sessions' describe a behavioural difference, not a causal increase. Shoppers who engage with recommendations were already higher-intent. Only a randomized holdout reveals true incremental revenue.
03
Buy starts cheap; build starts expensive.SaaS entry pricing begins near $25/month (Rebuy à la carte) and scales to a reported $50K+/year at the enterprise tier (Bloomreach). A custom build is a directional $70K–$400K+ upfront plus 10–15% annually for maintenance and retraining.
04
Algorithm choice is really a cold-start choice.Collaborative filtering fails on new users and new items; content-based and vector-embedding approaches handle new items; session-based transformers handle anonymous, early-funnel traffic. Match your catalog churn and data maturity to the family, not the brand name.
05
Data ownership is the hidden decision variable.Most SaaS platforms retain the behavioural training data. Brands with multi-channel data strategies or strict privacy constraints may prefer managed ML (Amazon Personalize) or a custom store that keeps training data in-house.

01 — Why It MattersA small slice of clicks, a large slice of revenue.

The case for recommendations rests on a recurring shape in the data: recommendation surfaces touch a minority of sessions but punch well above their weight on revenue. Salesforce Research data cited by Bloomreach reports that recommendation clicks account for roughly 7% of site traffic but around 26% of ecommerce revenue, a ratio independently echoed across several vendor datasets. Barilliance's 2023 study put the ceiling higher still, attributing up to 31% of total site revenue to recommendation engines.

The headline anchors are familiar. Amazon is widely cited as generating around a third of its revenue from recommendations — a McKinsey-attributed benchmark that has been republished so often the original report is rarely linked directly, so treat it as an industry benchmark rather than a current audited figure. Netflix's oft-quoted "75% of viewing from recommendations" traces to a 2015/16 estimate; it is historical context for how central discovery became, not a 2026 statistic.

Broader personalization research points the same direction without the inflation. McKinsey's frequently-cited finding is a 5–15% revenue lift from personalization for most companies, with faster-growing firms extracting more value from it than slower competitors. That band — single to low-double digits — is a far more defensible planning number than any vendor case study, and it is the one we anchor budgets to.

Traffic vs revenue

Of clicks, ~26% of revenue

~7%

Salesforce Research, cited via Bloomreach: recommendation clicks are roughly 7% of site traffic yet generate around 26% of ecommerce revenue. A disproportionate, but attributed, contribution.

Salesforce / Bloomreach

Revenue ceiling

Upper-bound contribution

31%

Barilliance's 2023 study attributes up to 31% of total ecommerce site revenue to recommendation engines. Use as a ceiling, not a baseline — your real number depends on placement and traffic mix.

Barilliance 2023

Personalization lift

The defensible band

5–15%

McKinsey's widely-cited estimate of revenue lift from personalization. Single-to-low-double-digit is the honest planning anchor — well below the eye-catching attributed figures vendors lead with.

McKinsey-cited

Demand-side signals reinforce the trend. Survey data points to most consumers expecting personalized interactions and a majority growing frustrated when it is absent, and reporting suggests the share of marketing budget going to personalization has climbed sharply since 2023, with most brands planning to spend more in 2026. Our reading: the question for mid-market merchants has shifted from whether to personalize to how much to spend and on which path — which is exactly the build-vs-buy problem the rest of this guide solves.

02 — The Attribution GapAttributed revenue is not incremental lift.

Here is the story almost no vendor tells. When a platform reports that recommendations drove a given share of revenue, it is counting sessions in which a shopper clicked a recommendation and later purchased. That is attribution. It does not ask the only question that matters for ROI: would that shopper have bought anyway, without the recommendation in front of them?

The two diverge because the people who engage with recommendations are not a random sample. They are, on average, already further down the funnel — more engaged, higher intent, more likely to convert no matter what the page showed them. Crediting the recommendation with their entire purchase confuses correlation with cause. The widely quoted statistic that rec-engaged sessions show dramatically higher average order value is a textbook example: it is a real difference between two groups of shoppers, but it is a difference in who engages, not proof of how much the engine added.

The number most brands never compute

Vendor case studies routinely cite outsized gains — large conversion and average-order-value uplifts tied to recommendation engagement. Almost all are attributed: observed differences between shoppers who engaged and shoppers who did not. They are not causal, holdout-tested increments, and the holdout-tested number is typically much lower. Treat any single-vendor lift figure as a marketing claim until you have measured your own.

The fix is borrowed straight from paid media: a randomized holdout. Withhold recommendations from a randomly assigned share of traffic, keep serving them to everyone else, and measure the revenue difference between the two groups. That delta — not the attributed total — is the incremental lift, and it is the only figure worth putting in a business case. The discipline is identical to the one we lay out for media in our guide to incrementality testing and causal lift; the mechanics translate cleanly to on-site recommendations.

"Finding patterns in user behavior and suggesting products that similar users have liked"— Bloomreach, defining collaborative filtering

None of this means the engines do not work — the McKinsey-cited 5–15% personalization band is real, and a well-placed recommendation does change behaviour. It means the size of the win is almost always overstated in the materials you will be sold on, and that the merchant who measures it honestly can negotiate better, allocate budget more accurately, and avoid over-investing in a capability whose true contribution they have never tested.

03 — Algorithm FamiliesMatch the algorithm to your cold-start problem.

Most guides list "collaborative filtering, content-based, hybrid" in one breath and move on. The decision that actually matters is which family handles your particular cold-start problem — the situation where the engine has no behavioural history to lean on, either because the user is new or anonymous, or because the item was just added to the catalog. Catalog churn and traffic anonymity, not brand preference, should drive the choice.

Collaborative filtering learns from co-behaviour — surfacing what similar users liked — and is powerful once data is dense, but it fails hard on brand-new users and brand-new items it has never seen interact. Content-based methods recommend on item attributes, so they handle new items but can feel narrow. Session-based sequential models (SASRec, BERT4Rec, NVIDIA's Transformers4Rec) shine for short, anonymous, early-funnel sessions; Transformers4Rec notably won two major ecommerce recommendation competitions. Vector-embedding similarity uses cosine distance in a high-dimensional space to surface semantically related products, which is especially valuable for cold-start new items.

Recommendation algorithm family comparison: minimum data requirement, new-user cold-start handling, new-item cold-start handling, best placement, and example platforms for collaborative filtering, content-based, session-based, LLM-augmented hybrid, and vector-embedding approaches.
Algorithm family	Data needed	New-user cold start	New-item cold start	Best placement
Collaborative filtering	High — dense user-item interaction history	Weak — no history to match against	Weak — needs interactions to place an item	Cart, "customers also bought"
Content-based	Low — clean product attributes / metadata	Moderate — can use a single viewed item	Strong — attributes known at upload	Product page, "similar items"
Session-based / sequential	Moderate — in-session click streams	Strong — works on anonymous sessions	Moderate — depends on item features	Early-funnel discovery, homepage
LLM-augmented hybrid	Variable — CF signal + enriched features	Strong — research shows cold-start gains	Strong — semantic understanding of items	Search, conversational discovery
Vector-embedding similarity	Low — embeddings from text/image/attributes	Moderate — needs a seed item or query	Strong — embeds new items immediately	Visual / semantic "more like this"

The frontier is hybrid. Research from 2024 found that LLM-augmented systems outperform pure collaborative filtering in cold-start scenarios but can underperform traditional collaborative filtering on warm, data-rich user-item pairs — which is why the research-backed best practice is a hybrid that uses an LLM for feature enrichment and collaborative filtering for the behavioural signal. Verify that this direction still holds in current literature before betting an architecture on it; it is a fast-moving research area. For most merchants the practical takeaway is simpler: you do not need one algorithm, you need the right one in each placement.

04 — The Buy PathSaaS: fast to deploy, pricing scales with scale.

The buy path is a spectrum, not a single product. At the entry end sit app-store tools that install on a hosted platform in hours; at the top end sit enterprise experience platforms with custom-quoted contracts. Pricing below is vendor-stated and subject to change — confirm current terms directly before committing.

Rebuy anchors the accessible end of the market: a vendor-stated 50,000+ Shopify brands and billions in attributed revenue (attributed, again — not incremental). Its "Build Your Own" plan starts around $25/month billed à la carte by order volume, with an all-inclusive "Platform One" tier from roughly $534/month. Nosto serves a vendor-stated 1,500+ brands and quotes custom pricing based on GMV, traffic, and modules — it does not publish standard tiers, so any specific Nosto price you see elsewhere should be treated with suspicion. Bloomreach sits at the enterprise tier, with pricing reported in the range of $50K+/year — a directional figure, not a published rate card.

Accessible SaaS

Rebuy & app-store tools

from ~$25/mo · à la carte to ~$534/mo

Installs on hosted platforms in hours. Rebuy reports 50,000+ Shopify brands and billions in attributed revenue. Best for growth-stage merchants who want recommendations live this week without an engineering lift.

Vendor-stated pricing

Mid-market to enterprise

Nosto & Bloomreach

custom-quoted · Bloomreach ~$50K+/yr

Nosto serves 1,500+ brands with GMV-based custom pricing and multiple AI types. Bloomreach targets enterprise. Deeper personalization, more placements, heavier contracts — and the platform usually retains the training data.

Custom quote required

Managed ML

Amazon Personalize

pay-per-use · free tier for 2 months

AWS managed recommendation service: usage-based pricing for data ingestion, training, and inference, with a two-month free tier. Between buy and build — your data, your AWS account, no full custom team.

aws.amazon.com/personalize

A note on platform-native engines. Hosted platforms like Shopify ship a free, rule-based-plus-basic-ML recommendation engine baked in. It is genuinely fine for a starting point. But third-party analysis — not a Shopify-confirmed controlled study — suggests native recommendations underperform specialist apps by roughly 12–18% on conversion optimization, with the gap mattering most for stores that have significant revenue tied to recommendations. Read that as a directional signal to test a specialist app against native, not as a guaranteed delta. For stores where recommendations are a rounding error, native is the right call; for stores where they are a revenue pillar, the comparison is worth running.

Where managed ML fits

Amazon Personalize is the underrated middle path: AWS runs the infrastructure and the recipes, you bring your data and your account. Pricing is usage-based across ingestion, training, and inference, with a two-month free tier — so it is genuinely pay-as-you-grow. Crucially, the training data stays in your AWS account, not a vendor's, which is the data-ownership advantage most SaaS tools cannot offer.

05 — The Build PathCustom: control and ownership, at a real cost.

Building a custom recommendation engine buys you two things money cannot otherwise: full control over the algorithm and full ownership of the training data. It also costs real money and, more often, real time. Aggregated consultancy estimates put initial development at a directional $70,000 to $400,000+, with enterprise implementations reaching higher, plus an ongoing 10–15% annually for maintenance and model retraining. These are not binding quotes — they are ranges compiled from multiple vendor and consultancy sources, and your number depends heavily on scope.

The cost that surprises teams is not the modelling. A widely-cited rule of thumb in machine learning is that around 80% of project time goes to data preparation — cleaning, joining, and pipelining behavioural and catalog data — rather than to building the model itself. A recommendation engine is only as good as the data flowing into it, and most ecommerce data is messy, multi-source, and partially missing. Budget for the plumbing, not the algorithm.

Cost of ownership · buy vs build (directional)

Sources: Rebuy & Bloomreach (vendor-stated / reported); custom build from aggregated consultancy estimates — all directional

SaaS entry (Rebuy à la carte)~$25/month, scales with orders

$25/mo

Mid-tier SaaS (Rebuy Platform One)all-inclusive base, vendor-stated

~$534/mo

Enterprise SaaS (Bloomreach)reported, directional

~$50K+/yr

Custom build, year onedirectional estimate, upfront dev

$70K–$400K+

Custom maintenance, annual10–15% of build cost, ongoing

10–15%/yr

The build vs buy choice, then, is rarely about whether you can build — it is about whether the marginal control and data ownership justify a six-figure commitment and a multi-month timeline against a SaaS tool that is live this week. For most merchants below the enterprise tier, the honest answer is no. The exceptions are specific and worth naming, which is what the matrix in the next section does.

06 — The Decision MatrixBuild vs buy across five merchant tiers.

The table below is our consolidated decision matrix, mapped to GMV tiers and the two profiles — data-first and privacy-constrained — that override the simple revenue logic. Most published comparisons are either vendor-biased or generic developer guides; the value here is one view a CTO or CFO can use at a whiteboard. Costs are directional, drawn from the vendor-stated and aggregated estimates cited throughout this guide.

Build-vs-buy recommendation engine decision matrix by merchant profile: recommended path, time to first recommendation, directional year-one cost, and data ownership for startup, growth, mid-market, enterprise, and privacy-constrained brands.
Merchant profile	Recommended path	Time to first rec	Year-one cost (directional)	Data ownership
Startup · <$1M GMV	Platform-native engine	Hours — already built in	$0 (included in platform)	Platform-held
Growth · $1M–$10M GMV	Specialist SaaS app (e.g. Rebuy)	Days — app install + config	~$300–$7K/yr (vendor-stated)	Vendor-held
Mid-market · $10M–$50M GMV	Enterprise SaaS or managed ML	Weeks — integration + tuning	Custom quote / usage-based	Vendor-held (or your AWS)
Enterprise · $50M+ GMV	Managed ML or custom build	Months — full data pipeline	$50K+ SaaS up to $70K–$400K+ build	In-house (if built)
Data-first / privacy-constrained	Managed ML or custom (data-owning)	Weeks to months	Usage-based to six figures	In-house — the deciding factor

Read the matrix as a default, then adjust for the two variables that override GMV: how much revenue is genuinely tied to recommendations, and whether you have data-ownership or privacy constraints. A $5M-GMV brand with a fast-churning catalog and a strict first-party data posture can rationally jump straight to managed ML; a $40M brand whose recommendations are a minor surface can stay on a specialist app indefinitely.

07 — The Hidden VariableWho owns the training data?

The variable most build-vs-buy comparisons ignore is data ownership. When you buy a SaaS recommendation engine, the behavioural data that trains the model typically lives with the vendor. For many merchants that is a fair trade — the vendor does the hard machine-learning work and you get recommendations without a data-science team. But for brands pursuing a unified, multi-channel first-party data strategy, or operating under strict privacy obligations, handing the richest signal you own to a third party is a strategic cost that rarely shows up on the price comparison.

This is where managed ML and custom builds change the calculus. Building on Amazon Personalize keeps training data inside your own cloud account; a custom vector store or model keeps it entirely in-house. For privacy-constrained brands — those with processor obligations under regimes like GDPR, or those that simply refuse to seed a competitor-adjacent platform with their behavioural data — owning the training set can be the deciding factor, independent of cost or convenience. It is the same first-party-data logic that drives our broader ecommerce growth engagements and our work on CRM and customer-data automation.

Speed over control

Get recommendations live this week

Growth-stage, recommendations are a useful-but-not-central surface, and you have no appetite for an engineering project. Install a specialist SaaS app, accept vendor-held data, and move on. The McKinsey-cited 5–15% personalization band is achievable here.

Buy SaaS

Data ownership matters

Keep training data in your account

You run a multi-channel first-party data strategy or have privacy obligations, but you don't want a full custom team. Managed ML (Amazon Personalize) keeps data in your cloud with usage-based pricing — the underrated middle path.

Use managed ML

Control is the product

Recommendations are a revenue pillar

Enterprise GMV, a differentiated catalog, and revenue materially tied to discovery. A custom build — directional $70K–$400K+ plus 10–15%/yr — buys full algorithm control and data ownership. Budget 80% of it for data plumbing.

Build custom

Just starting out

Sub-$1M GMV, thin data

Your platform's native engine is free and good enough until recommendations demonstrably move revenue. Don't pay for a specialist app — or build anything — until a holdout test shows native is leaving money on the table.

Stay native

08 — How To MeasureThe one test that tells you the truth.

Whichever path you choose, the discipline is the same: do not accept the platform's attributed revenue as your ROI. Build the business case on incremental lift, and measure it with a holdout. The steps are simple and the cost is mostly forgone attributed credit, not cash.

Randomize a holdout. Assign a fixed share of traffic — commonly 5–20% — to a control group that sees no recommendations, with the rest as the treatment group. Randomize at the user or session level, consistently.
Hold the test long enough. Run across full purchase cycles so the comparison is not distorted by a promotion, a launch, or a seasonal spike hitting one group more than the other.
Measure the delta, not the total. Compare revenue-per-user between treatment and control. That difference is the incremental lift — typically well below the attributed figure the platform reports.
Re-test after major changes. A catalog overhaul, a new placement, or a platform switch can move the real number. Treat the holdout as a recurring instrument, not a one-time audit.

The standard we hold ourselves to

We treat a recommendation engine the way we treat a media channel: prove the lift before we scale the spend. If a platform cannot support a clean holdout, that is a meaningful mark against it — not a reason to trust its attributed dashboard. The same causal-measurement playbook we apply to paid media applies, almost unchanged, to on-site recommendations.

This is also where strategy meets execution. Picking the right path, standing up a holdout, and reading the result honestly is the kind of work our AI-powered product personalization engagements are built around — and it connects to adjacent surfaces like personalized recommendation emails and the agentic commerce protocol feeding the next wave of AI shopping agents. The engine is only half the problem; knowing what it is worth is the other half.

09 — ConclusionPick the path, then prove the lift.

The shape of the decision, June 2026

Build vs buy is a smaller question than build vs prove.

Recommendation engines earn their place in ecommerce — the McKinsey-cited 5–15% personalization band is real, and a good engine changes shopper behaviour. But the eye-catching numbers that sell them are attributed, not incremental, and the gap between the two is where most merchants quietly overpay. The first discipline is not picking a vendor; it is refusing to confuse a dashboard with a result.

On the build-vs-buy axis itself, the defaults are clear. Below roughly $1M GMV, the platform-native engine is enough. Through the growth and mid-market tiers, a specialist SaaS app or managed ML wins on speed and total cost — a custom build's directional $70K–$400K+ plus ongoing maintenance only pays off when recommendations are a genuine revenue pillar or when owning the training data is itself the requirement. Match the algorithm family to your cold-start reality, not to the brand on the box.

The forward signal is that this decision is converging on a barbell: cheap, capable native and app-store tools at one end, and data-owning managed ML or custom builds at the other, with the squeezed middle increasingly hard to justify. Either way, the merchant who runs a holdout knows something their competitors do not — what the engine is actually worth. That number, not the vendor's, is the one to build the next year's plan on.

Recommendation Engines in 2026: Build vs Buy