DevelopmentIndustry Guide14 min readPublished June 19, 2026

Five cache layers · RFC 9111 directives · Redis eviction · Next.js 16 use cache

Web Caching Strategies 2026: An Engineering Reference

Web caching is not one technique — it is five layers stacked from the browser to the database, each with its own invalidation semantics and its own way to fail. This reference maps all five in one place, starting from the single most-misunderstood directive in HTTP and ending at Next.js 16’s explicit use cache lifetimes.

DA
Digital Applied Team
Senior engineers · Published June 19, 2026
PublishedJune 19, 2026
Read time14 min
SourcesMDN · Next.js · Redis · CDN docs
Cache layers mapped
5
browser → database
Next.js static asset max-age
1yr
immutable, hashed assets
cacheLife stale minimum
30s
enforced by Next.js 16
Fastly surrogate-key purge
~150ms
global propagation

Web caching strategies in 2026 span five distinct layers — the browser, the CDN, the reverse proxy, the application cache, and the database query cache — and each one has a different invalidation model and a different way to fail. Treating caching as a single knob is the root cause of most stale-content bugs and most cascading outages. This reference maps all five layers in one place, from HTTP directives governed by RFC 9111 to Next.js 16’s explicit use cache lifetimes.

Most caching articles cover only one or two layers — usually HTTP headers or Redis, rarely both, and almost never the failure modes that connect them. That gap is why teams ship a clever Cache-Control policy and still get paged at 2 a.m. when a popular key expires and a thundering herd flattens the database. A mental model that maps where each piece of data should live, how it expires, and what breaks when it does is worth more than any single tuning trick.

Below: the canonical no-cache versus no-store misunderstanding, a decision matrix for every significant Cache-Control directive, why stale-while-revalidate is the most underused directive in HTTP, how CDN tag-based purging actually propagates, Redis eviction policies and cache-stampede mitigation, and how Next.js 16’s use cache plus cacheLife replace ISR-style fetching with composable lifetimes. It closes with a proprietary five-layer reference table you can keep next to your architecture diagram.

Key takeaways
  1. 01
    Caching is five layers, not one setting.Browser, CDN, reverse proxy, application (Redis/Memcached), and database query cache each have distinct invalidation semantics and failure modes. A response can be cached — or stale — at every one of them simultaneously.
  2. 02
    no-cache does not mean do not cache.no-cache permits storage but forces revalidation with the origin before reuse. The directive that actually prevents storage is no-store. This is the single most common caching misunderstanding among engineers.
  3. 03
    stale-while-revalidate hides latency at zero freshness cost.Specified in RFC 5861 and supported by Cloudflare, CloudFront, and Fastly, it lets a shared cache serve a stale response while it revalidates in the background — so users never wait on the origin round-trip.
  4. 04
    Cache stampede is the canonical application-layer failure.When a hot key expires, concurrent requests all hit the database at once. Mitigate with distributed locking (Redis SETNX), request coalescing (singleflight), or probabilistic early expiration — ideally layered together.
  5. 05
    Next.js 16 makes cache lifetimes explicit.The use cache directive with cacheLife profiles replaces ISR-style data fetching. It requires cacheComponents: true, and a stale below 30 seconds or expire under 5 minutes turns a cache into a request-time dynamic hole.

01The StackCaching is five layers, each with its own failure mode.

A request for a single resource can pass through — and be cached at — five independent layers before it ever reaches your business logic. Each layer caches for a different reason, expires on a different clock, and fails in a different way. Understanding the stack as a whole is what lets you answer the only question that matters when you’re deciding where to put a piece of data: which layer should own its freshness?

Layer 1
Browser / HTTP cache
max-age · immutable · ETag

Lives in the user's browser. Governed by Cache-Control and validators. Its sharpest edge: with no header set, caches may heuristically store a response for roughly 10% of its last-modified age.

RFC 9111
Layer 2
CDN / edge cache
s-maxage · tag purge · SWR

A shared cache near the user. Invalidated by tag/surrogate-key purges or path purges. Fails through Vary fragmentation and uneven purge support across pricing plans.

Cloudflare · Fastly · CloudFront
Layer 3
Reverse proxy
purge API · soft-purge

An origin-adjacent shared cache. Often the same tech as the CDN. Soft-purge marks entries stale rather than deleting them, so stale-if-error fallbacks survive an outage.

Surrogate keys
Layer 4
Application cache
Redis · Memcached

An in-memory key-value store. Invalidated by key delete, TTL, or eviction policy. Its signature failure is the cache stampede when a popular key expires under concurrent load.

allkeys-lru / allkeys-lfu
Layer 5
Database query cache
hash-of-query keys

Caches full query results, keyed by a hash of the query and its parameters. Any write to a referenced table invalidates the cached result — heavy invalidation overhead in write-heavy workloads.

cache-aside / read-through
The one question
For any piece of data, ask: which layer should own its freshness? Hashed assets belong to the browser at a one-year immutable TTL. Per-tenant API responses belong to Redis with explicit invalidation. A live dashboard count belongs nowhere durable — serve it dynamically. Most stale-content bugs trace back to two layers both thinking they own the same value.

02The Canonical Trapno-cache does not mean “do not cache.”

This is the single most common caching misunderstanding, and it is worth correcting before anything else. The directive no-cache does not prevent a response from being stored. It means the cache may store the response, but it must revalidate with the origin before reusing it — typically via a conditional request that returns 304 Not Modified when the content is unchanged. The directive that actually prevents storage entirely is no-store.

The practical consequence: if you want a resource to never be cached at all — a response with sensitive data, say — no-cache is the wrong choice. It permits storage and merely demands a revalidation round-trip. Reach for no-store instead. Conversely, if you want a resource that is always served fresh but still benefit from a 304 when nothing changed, no-cache is exactly right — and far more bandwidth- efficient than disabling caching outright.

Remember it this way
no-store = never write it down. no-cache = write it down, but check with me before reusing it. Per MDN’s HTTP caching guide, Cache-Control directives are case-insensitive (lowercase is recommended) and comma-separated, so a single header can combine several at once.

One more directive deserves a mention here because engineers conflate it with both: must-revalidate tells a cache it may serve a stored response while it is still fresh, but once that response goes stale it must revalidate with the origin and may not serve the stale copy on its own initiative. That is a meaningfully different contract from no-cache, which revalidates on every use, fresh or not.

03Decision MatrixEvery Cache-Control directive, and the mistake to avoid with each.

MDN documents these directives narratively. The matrix below is the compact decision form — for each significant directive, whether it applies to private caches, shared caches, or both; whether the response may be stored; what it is best for; and the specific mistake engineers most often make with it. RFC numbers are included so you can trace any cell back to the spec.

Cache-Control directive decision matrix: for each directive, whether it applies to private/shared/both caches, whether the response may be stored, what it is best for, the most common mistake to avoid, and its governing RFC.
DirectiveApplies toStores?Best forMistake to avoidRFC
max-age=NBothYesVersioned or slow-changing assetsMeasured from origin generation, not cache receipt9111
s-maxage=NSharedYesOverriding max-age at the CDN onlyIgnored by private/browser caches9111
no-cacheBothYesAlways-fresh resources you still want cachedDoes NOT prevent storage — it forces revalidation9111
no-storeBothNoSensitive or never-cacheable responsesConfusing it with no-cache9111
must-revalidateBothYesResources that must never serve stale once expiredPairing it with stale-while-revalidate by accident9111
stale-while-revalidate=NBothYesHiding revalidation latency from usersUnderusing it — the most overlooked directive5861
stale-if-error=NBothYesServing last-good content during origin 5xxForgetting it leaves no fallback on outage5861
immutableBothYesHashed static assets that never changeUsing it on URLs without content hashes8246
privatePrivateYesPer-user responses (browser cache only)Letting a shared CDN cache personalised data9111
publicSharedYesExplicitly opting responses into shared cachesMarking authenticated responses public9111

Two cells warrant emphasis. First, max-age measures elapsed time since the response was generated on the origin server — not since it was received by an intermediate cache. The Age header that intermediate caches add deducts that transit time, so a response with max-age=600 that spent 120 seconds reaching a CDN is already 2 minutes into its life on arrival. Second, immutable (RFC 8246) is only safe on URLs that carry a content hash, which is exactly why Next.js applies public, max-age=31536000, immutable to everything under /_next/static/ — the hash in the filename is the cache buster.

04The Underused Directivestale-while-revalidate hides latency at zero freshness cost.

stale-while-revalidate (RFC 5861) is the most underused directive in the HTTP caching toolkit. It lets a cache serve a stale response for a defined window while it revalidates the resource in the background — so the user gets an instant response and never waits on the origin round-trip. The revalidated content is ready for the next request. A combined example: Cache-Control: max-age=2592000, stale-while-revalidate=86400 caches for 30 days with a one-day grace window during which stale content is served while a fresh copy is fetched.

Caches may serve the response...after it becomes stale, up to the indicated number of seconds.MDN Web Docs — stale-while-revalidate, Cache-Control reference

Its sibling, stale-if-error (also RFC 5861), is your origin-outage safety net. It permits a cache to serve a stale response for a defined window when the origin returns a 5xx status — 500, 502, 503, or 504 — or is unreachable entirely. Stacked together, the three directives form a resilient policy: Cache-Control: max-age=3600, stale-while-revalidate=600, stale-if-error=86400 gives one hour of freshness, a ten-minute window to hide revalidation latency, and a full day of last-good fallback if the origin falls over.

A persistent myth is that stale-while-revalidate is a browser-only feature. It is not — it is specified for shared caches and is supported at the edge by Cloudflare, Amazon CloudFront, and Fastly. That makes it most valuable precisely where it is least used: at the CDN, where a single background revalidation can shield the origin from a traffic spike while every user still gets a sub-second response.

Where it earns its keep
The combination of a long max-age and a generous stale-while-revalidate window is the closest HTTP gets to ISR-style behaviour without any application code. The cache absorbs the traffic, the origin revalidates lazily, and freshness cost stays at zero because users never block on the round-trip.

05ValidatorsETags, conditional requests, and the 304 that saves bandwidth.

When a cached response goes stale, revalidation does not have to re-download the whole body. Validators let a cache ask the origin “has this changed?” and receive a tiny 304 Not Modified with no body when it hasn’t. There are two: ETag and Last-Modified.

Strong ETags guarantee byte-for-byte identity; weak ETags — prefixed W/ — guarantee only semantic equivalence. RFC 9110 recommends sending both an ETag and a Last-Modified header in responses, and during revalidation If-None-Match takes precedence over If-Modified-Since. A matching ETag returns 304 Not Modified with no response body, saving the bandwidth of re-sending an unchanged resource.

Heuristic risk
When no header is set
~10%

If a response carries Last-Modified but no Cache-Control, caches may heuristically store it for roughly 10% of the time since last modification, per the RFC 9111 recommendation. Always set an explicit header.

RFC 9111
QPACK index 41
public, max-age=31536000
1yr

RFC 9204 ships pre-defined max-age values: index 37 is one week, 38 is one month, and 41 is public, max-age=31536000. HTTP/2 and HTTP/3 implementations use these compressed forms widely.

RFC 9204
Vary discipline
Per request-header variation
1key

Vary stores a separate response per unique value of a header. Vary: Accept-Language is fine; Vary: User-Agent should be avoided because the variation count explodes and shreds cache hit-rates.

Shared caches

06Edge PurgingCDN invalidation: tags, surrogate keys, and propagation time.

At the edge, the question shifts from “when does this expire?” to “how do I purge it the instant it changes?” The three major CDNs answer it differently, and the differences matter for both speed and cost. A well-architected web development stack layers caching across the browser, CDN, and application tiers — and wires purge into the same workflow that publishes the content.

CDN invalidation mechanisms · propagation & capability

Sources: Fastly purge docs; CDN invalidation guide; Cloudflare changelog (2026-03-24)
Fastly surrogate-key purgeGlobal propagation of a tag-based purge
~150ms
CloudFront invalidationPath-based invalidation propagation
10–60s
Cloudflare Cache Response RulesEdit Cache-Control & cache tags at the edge
Mar 2026

Fastly leads on tag-based purging. Its Surrogate-Key mechanism lets you tag any response and purge every object carrying that tag in roughly 150ms globally. Individual keys are limited to 1,024 bytes and the full Surrogate-Key header may not exceed 16,384 bytes; purges run via dashboard, API, CLI, and a Rust edge SDK, and soft purges mark content stale rather than deleting it outright.

Cloudflare exposes a dedicated CDN-Cache-Control header that controls CDN behaviour without affecting upstream or downstream caches, accepting the same directives as Cache-Control. Its Origin Cache Control is enabled by default on Free, Pro, and Business plans, and the Cache Response Rules that shipped on March 24, 2026 let operators rewrite Cache-Control directives, manage cache tags, and strip headers like Set-Cookie from origin responses before they reach the cache — all without touching origin config.

Amazon CloudFront handles invalidation by path, with reported propagation in the 10-to-60-second range. It also supports stale-while-revalidate and stale-if-error directives at the edge.

Verify before you depend on it
Tag-based purge is not universally available on every plan tier. Cloudflare’s cache-tag and prefix purge have historically required higher-tier plans, and the rollout that expanded them is recent — confirm the purge types your current plan supports before designing an invalidation strategy around them. The same caution applies to CloudFront propagation windows, which vary with edge-location count.

07Application LayerRedis eviction policies and the thundering herd.

The application cache is where most teams spend their tuning time, and two decisions dominate: which eviction policy runs when memory fills, and how you prevent a cache stampede when a hot key expires.

Eviction policies

Redis offers a full set of eviction policies — noeviction, the allkeys-* family (lru, lfu, random), and the volatile-* family that only evicts keys carrying a TTL. For a pure cache, allkeys-lru or allkeys-lfu are recommended; for a mixed workload where Redis also holds non-cache data, use volatile-lru with TTLs set on the cached keys only. LFU support arrived in Redis 4.0. Eviction only triggers once the instance reaches maxmemory — below that ceiling, keys live until their TTL.

Two tuning knobs are worth knowing. LRU is approximate: Redis samples a handful of keys rather than scanning them all, and the default maxmemory-samples 5 can be raised to 10 for a closer approximation of true LRU at a marginal CPU cost. For LFU, the defaults lfu-log-factor 10 and lfu-decay-time 1 (minutes) control how quickly the frequency counter saturates and how fast access counts decay.

Cache stampede (thundering herd)

When a popular cache key expires, many concurrent requests can simultaneously query the database to regenerate it — potentially causing a cascading failure. There are three primary mitigations, and they compose well:

Mitigation 1
Distributed locking

A Redis lock via SETNX with an expiry ensures only one process across all instances regenerates the value; the rest wait briefly or serve stale. Simple and effective, but the lock holder becomes a single point of latency.

Redis SETNX + expiry
Mitigation 2
Request coalescing

Singleflight collapses N concurrent identical requests into one origin call and fans the single result back out. Go's golang.org/x/sync/singleflight is the canonical implementation; equivalents exist in most ecosystems.

singleflight
Mitigation 3
Probabilistic early expiration

Refresh a key probabilistically before its TTL actually hits, so regeneration is spread across time rather than synchronised on a single expiry instant. No coordination required, and it removes the expiry cliff entirely.

Refresh ahead of TTL
Layer them
stale-while-revalidate at the app tier

Serving the stale value while one process regenerates — the application-tier analogue of the HTTP directive — combines naturally with locking or singleflight to keep latency flat during regeneration.

Serve stale + regenerate
A Redis-based lock ensures only one request across all instances fetches from the database.SWE Helper — cache stampede prevention

One layer deeper sits the database query cache, where Redis can cut response times substantially for read-heavy workloads by keying results on a hash of the full query and its parameters. The catch is invalidation: any write to any table referenced in a cached query invalidates the entire cached result, which means query caching pays off most in read-heavy systems and least in write-heavy ones. This is the natural complement to reducing database query load through indexing — caching removes the query, indexing makes the unavoidable ones fast.

Latency claims, qualified
Vendor benchmarks describe Redis query caching cutting read latency from hundreds of milliseconds toward the single-digit range. Treat the exact figure as directional, not universal: real-world latency depends heavily on payload size, network topology, serialization cost, and query complexity. The pattern is sound; the specific number is workload-dependent — measure it on your own data before quoting it.

08Framework CachingNext.js 16 use cache and explicit cacheLife profiles.

Next.js 16 (caching docs at version 16.2.9) replaces ISR-style data fetching with an explicit, composable model. The use cache directive caches the return value of an async function or component, and it is enabled by setting cacheComponents: true in next.config.ts. You can apply it at the data level — an individual fetching function — or at the UI level, on a full component or page. Arguments and closed-over values automatically become part of the cache key, so two calls with different inputs cache separately without any manual key management. This is the framework context for migrating to Next.js 16 Cache Components.

It is recommended to specify an explicit cacheLife. With explicit lifetime values, you can inspect a cached function or component and immediately know its behavior without tracing through nested caches.Next.js documentation — cacheLife API reference

The companion cacheLife function controls how long a cached value lives, via three properties: stale (how long the client router cache may serve without checking the server), revalidate (how often the server refreshes in the background), and expire (the hard maximum age before the cache is treated as dynamic). Next.js ships seven built-in profiles — the table below maps each to its use case and its stale/revalidate/ expire triple.

Next.js 16 cacheLife built-in profiles, version 16.2.9: the use case, stale time, revalidate interval, and expire window for each of the seven profiles from seconds to max.
ProfileUse casestalerevalidateexpire
defaultStandard content5 minutes15 minutesnever
secondsReal-time data30 sec1 second1 minute
minutesFrequently updated5 minutes1 minute1 hour
hoursMultiple daily updates5 minutes1 hour1 day
daysDaily updates5 minutes1 day1 week
weeksWeekly updates5 minutes1 week30 days
maxRarely changes5 minutes30 days1 year

Three sharp edges are worth internalising. First, the stale minimum is 30 seconds, enforced by Next.js — and stale controls the client-side router cache via the x-nextjs-stale-time response header, not Cache-Control directly. Second, calling revalidateTag(), revalidatePath(), updateTag(), or refresh() from a Server Action immediately clears the entire client cache, bypassing the stale time. Third — and most likely to trip you up — a cache with revalidate=0 or expire under 5 minutes is automatically excluded from prerenders and becomes a request-time “dynamic hole.” That includes the built-in seconds profile, and a short-lived cache nested inside a longer use cache without an explicit cacheLife will throw a prerender error.

This connects to Partial Prerendering, the default rendering mode when Cache Components is enabled. Static content and use cache content become the static HTML shell, while <Suspense>-wrapped dynamic content streams in at request time. Looking forward, the Next.js team is moving toward pathname-based CDN cache keying — where a full-page RSC response is served from /my/page.rsc and segment RSC from a .segment.rsc path — so CDNs need no Vary support and no custom header parsing. It is an announced design direction worth designing toward, not yet a finished default.

09The ReferenceThe five-layer cache reference, in one table.

Here is the table to keep next to your architecture diagram. For each of the five layers it names the invalidation mechanism, the characteristic failure mode, the recommended directives or patterns, the Next.js integration point, and a typical TTL range. No single existing reference maps all five layers with their failure modes together — this is the one to bookmark.

The five-layer cache reference: for browser/HTTP, CDN/edge, reverse proxy, application (Redis/Memcached), and database query cache, the invalidation mechanism, characteristic failure mode, recommended directives or patterns, Next.js integration point, and typical TTL range.
LayerInvalidationFailure modeDirectives / patternsNext.js pointTypical TTL
Client & edge tiers
Browser / HTTP cacheContent-hash filenames; ETag 304 revalidationHeuristic caching (~10% of last-modified age) when no header is setmax-age, immutable, no-cache/_next/static at max-age=31536000, immutableSeconds to 1 year
CDN / edge cacheTag/surrogate-key purge; path or prefix purgeVary fragmentation; cross-plan purge gapss-maxage, CDN-Cache-Control, stale-while-revalidates-maxage on static + ISR responsesMinutes to 1 year
Reverse proxyExplicit purge API; soft-purge to staleStale config drift between origin and proxys-maxage, surrogate keys, stale-if-errorSits in front of the Node/runtime serverSeconds to hours
Origin & data tiers
Application cache (Redis / Memcached)Key delete; TTL expiry; eviction policyCache stampede (thundering herd) on key expiryallkeys-lru / allkeys-lfu; SETNX lockingBacking store for use cache / cacheLifeSeconds to days
Database query cacheInvalidate on any write to a referenced tableHeavy invalidation overhead in write-heavy loadsHash-of-query keys; cache-aside / read-throughWrapped behind a use cache data functionSeconds to minutes

Read the table as a routing guide. Push immutable, hashed assets to the browser and forget about them for a year. Put cacheable, shared-but-purgeable content at the CDN and wire surrogate-key purges into your publish workflow. Reserve the application cache for the expensive, per-tenant computations that benefit most from being memoised — and protect those keys against stampede. Use the database query cache sparingly, and only where reads dominate writes, because its invalidation overhead grows with write volume. Each layer is a tool; the architecture is choosing which one owns each value. The same care that goes into rate-limiting strategies for your API layer and into idempotency in distributed systems applies here: the perimeter behaviour is only as reliable as the invalidation discipline behind it.

10ConclusionCaching is a layered discipline, not a single switch.

The shape of web caching, June 2026

The hard part of caching was never storing — it's invalidating.

Every layer in the stack makes storing a value trivial. What separates a fast, correct system from a flaky one is invalidation: knowing which layer owns each value’s freshness, how it expires, and what happens when it does. Get that mapping right and the five layers reinforce each other; get it wrong and a single stale entry, or a single expired hot key, takes the whole request path down with it.

The throughline of 2026 is that the best primitives are converging on the same idea — serve something instantly, refresh lazily. RFC 5861’s stale-while-revalidate does it at the HTTP layer, singleflight and probabilistic expiration do it in Redis, and Next.js 16’s use cache with explicit cacheLife profiles does it in the framework. Designing for that pattern, rather than against it, is what keeps a system fast under load without sacrificing correctness.

Start from the canonical correction — no-cache means revalidate, not refuse — set explicit headers everywhere so heuristic caching never surprises you, and pick the layer that should own each value’s freshness deliberately. Keep the five-layer reference table close, verify CDN and framework behaviour against current docs before you depend on it, and measure your own latency rather than trusting a vendor benchmark. Caching rewards discipline far more than cleverness.

Make your stack fast and correct under load

A multi-layer caching strategy makes your application fast and correct.

Our team architects multi-layer caching across browser, CDN, application, and database tiers — and the invalidation discipline that keeps it correct under load, delivered in days not quarters.

Free consultationSenior engineeringTailored architecture
What we work on

Performance & caching engagements

  • Multi-layer cache architecture — browser to database
  • CDN tag-based invalidation wired into publish workflows
  • Redis eviction tuning & cache-stampede hardening
  • Next.js 16 use cache / cacheLife migration
  • Edge stale-while-revalidate for origin resilience
FAQ · Web caching guide

The questions engineers ask about caching.

They are the most commonly confused caching directives. no-cache does NOT prevent storage — it permits a cache to store the response but requires revalidation with the origin before each reuse, typically via a conditional request that returns 304 Not Modified when the content is unchanged. no-store prevents storage entirely; the response is never written to any cache. The rule of thumb: use no-store for sensitive or never-cacheable responses, and no-cache for resources you want served fresh on every request but still cached so a 304 can save bandwidth when nothing has changed. Reaching for no-cache when you mean no-store is the single most common caching mistake.