Back to Blogs
Cache Strategies in Distributed Systems

Cache Strategies in Distributed Systems

February 25, 2026
5 min read
View on Hashnode
#system design course#caching strategies#system design interview#cache#ttl-expiration#netflix
Stop the Thundering Herd! Explore Jitter, Probabilistic Re-computation, SWR, and Cache Warming to keep your high-traffic systems stable during Netflix-scale spikes. Why Basic TTL caching Fails? Bas

Stop the Thundering Herd! Explore Jitter, Probabilistic Re-computation, SWR, and Cache Warming to keep your high-traffic systems stable during Netflix-scale spikes.


Why Basic TTL caching Fails?

Basic TTL (Time to Live) is like a ticking time bomb. If you set 1 million keys to expire in exactly 3600 seconds, you are essentially scheduling a system crash for 1 hour from now. Imagine a "Flash Sale" on an e-commerce site:

  1. You cache the "Product Price" for 60 seconds.

  2. 50,000 people are looking at that price.

  3. At the 60th second, the cache disappears.

  4. All 50,000 users see a "Cache Miss" and hit the Database at the exact same millisecond.


The Roadmap to a Bulletproof Cache:

Step 1: Advanced Expiry (Jitter & PER)

TTL Jitter: The Power of Randomness

Instead of TTL = 60s , we use TTL = 60s + random(0,10s). Now, some user's cache expire at 60s, some at 63s, some at 70s. This turns a "Thundering Herd" into a "Predictable Stream". It's like a staggered release times for a crowded theatre.

Probabilistic Early Re-computation (PER)

This sounds fancy, but the logic is simple: The close we get to the expiration time, the higher the "chance" that a request will decide to refresh the cache early.

  • At 10 seconds left, there is a 1% chance a request will refresh it.

  • At 1 second left, there's an 80% chance.

This ensures that one lucky user refreshes the data before it every actually hits zero and disappears

Step 2: The "Wait or Serve Old" Strategies

1. Mutex / Cache Locking (The "Old Producer" Rule)

Imagine the cache for "Current Score" expires, 100 requests arrive.

  • Without a lock: All 100 go to the DB.

  • With a Mutex: The request "locks" the key. It says, "I'm, going to the DB to get the new score. Everyone else, stay here and wait".

  • The other 99 requests, "wait" for a few milliseconds, until the first one returns and updates the cache.

2. Stale-While-Revalidate (SWR)

This is the "Netflix/CDN favourite." Instead of making users wait for the new data, you give them the old (stale) data one last time while you fetch the new data in the background.

  • Request arrives: "Is the data expired?"

  • System: "Yes, it expired 2 seconds ago. But here is the old version so you don't have to wait. I'll go grab the new one now for the next person."

  • Result: Zero latency for the user!

Step 3: Proactive Measures ("The Chef's Preparation")

Cache Warming / Pre-Warming

Think of an IPL final or a Netflix release. You know a million people are going to ask for the "Match Summary" or "Stranger Things Episode 1" at exactly 8:00 PM. Don't wait for them to ask! Your system should "warm up" the cache by fetching that data and putting it in Redis at 7:55 PM. By the time the crowd arrives, the "oven" is already hot.


The Big Tradeoffs: Freshness vs. Latency

In engineering, there is no free lunch.

  • SWR: Amazing speed (low latency), but some users see "stale" (old) data for a few seconds.

  • Mutex: Perfect data (high consistency), but some users have to "wait" (higher latency) while the lock is held.


When to use which Strategy ?

<table style="min-width: 100px;"><colgroup><col style="min-width: 25px;"><col style="min-width: 25px;"><col style="min-width: 25px;"><col style="min-width: 25px;"></colgroup><tbody><tr><th colspan="1" rowspan="1"><p>Strategy</p></th><th colspan="1" rowspan="1"><p>Best For</p></th><th colspan="1" rowspan="1"><p>Real-World Example</p></th><th colspan="1" rowspan="1"><p>Tradeoff</p></th></tr><tr><td colspan="1" rowspan="1"><p><strong>TTL Jitter</strong></p></td><td colspan="1" rowspan="1"><p>General-purpose stability.</p></td><td colspan="1" rowspan="1"><p>Any standard web app.</p></td><td colspan="1" rowspan="1"><p>None! Should be your default.</p></td></tr><tr><td colspan="1" rowspan="1"><p><strong>Mutex / Locking</strong></p></td><td colspan="1" rowspan="1"><p>High consistency (data MUST be right).</p></td><td colspan="1" rowspan="1"><p>Stock prices, election results, bank balances.</p></td><td colspan="1" rowspan="1"><p>Increased latency for some users.</p></td></tr><tr><td colspan="1" rowspan="1"><p><strong>SWR (Stale-While-Revalidate)</strong></p></td><td colspan="1" rowspan="1"><p>High availability (speed is everything).</p></td><td colspan="1" rowspan="1"><p>Social media feeds, Netflix movie lists.</p></td><td colspan="1" rowspan="1"><p>Users see slightly old data.</p></td></tr><tr><td colspan="1" rowspan="1"><p><strong>Probabilistic (PER)</strong></p></td><td colspan="1" rowspan="1"><p>Extremely hot keys that never want to "miss."</p></td><td colspan="1" rowspan="1"><p>The "Home Page" of a giant site like Amazon.</p></td><td colspan="1" rowspan="1"><p>Slightly more CPU used to calculate probability.</p></td></tr><tr><td colspan="1" rowspan="1"><p><strong>Cache Warming</strong></p></td><td colspan="1" rowspan="1"><p>Known events with massive predictable starts.</p></td><td colspan="1" rowspan="1"><p><strong>IPL match start</strong>, Black Friday, Netflix series drop.</p></td><td colspan="1" rowspan="1"><p>Requires manual setup or scripts.</p></td></tr></tbody></table>

Choosing Your Weapon

  • Use Jitter ALWAYS: It's the "low-hanging fruit." There is almost no reason not to add a little randomness to your expiries to prevent synchronized spikes.

  • Use SWR for UX: If your app feels "snappy," users stay longer. If a user sees a post from 10 seconds ago instead of 1 second ago, they usually won't notice. This is why CDNs (like Cloudflare) love SWR.

  • Use Cache Warming for "The Big Event": If you know 10 million people are coming at 8:00 PM, waiting for the first user to "trigger" the cache is a mistake. You want that data waiting for them.