Why Cache?
- Speed: Reading from memory is ~100,000x faster than reading from disk. A cache hit avoids the slow path entirely.
- Reduced Load: Fewer queries hit the database, freeing it to handle writes and complex queries.
- Cost Efficiency: Serving responses from cache requires fewer servers and less database capacity.
- Resilience: A cache can serve stale data temporarily if the database goes down.
Cache Layers
Caching can happen at every layer of the stack:
Caching Strategies
Cache-Aside (Lazy Loading)
The most common pattern. The application checks the cache first. On a cache miss, it reads from the database, stores the result in the cache, and returns it to the client.
result = cache.get(key)
if result is None: # Cache miss
result = db.query(key) # Read from DB
cache.set(key, result, ttl) # Populate cache
return resultPros: Cache only contains data that is actually requested. Simple to implement.
Cons: First request always hits the database. Cache and database can become inconsistent if data is updated in the database without invalidating the cache.
Write-Through
Every write goes to both the cache and the database simultaneously. The cache always has the latest data.
Pros: Cache is always up to date. No stale data.
Cons: Higher write latency (two writes per operation). Cache may contain data that is never read (wasted memory).
Write-Behind (Write-Back)
Writes go to the cache first, and the cache asynchronously flushes changes to the database later. This significantly reduces write latency as perceived by the client.
Pros: Very fast writes. Batching writes to the database is efficient.
Cons: Risk of data loss if the cache crashes before flushing. More complex to implement.
Read-Through
Similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only talks to the cache.
Pros: Simpler application code. Cache handles all data loading logic.
Cons: Tightly couples cache to database. Not all cache solutions support this natively.
Eviction Policies
When a cache is full, it must decide which entries to remove to make room for new ones:
| Policy | Behavior | Best For |
|---|---|---|
| LRU | Least Recently Used: evicts the entry that has not been accessed for the longest time. | General purpose. Most popular choice. |
| LFU | Least Frequently Used: evicts the entry accessed the fewest times. | Workloads with distinct popularity tiers. |
| FIFO | First In, First Out: evicts the oldest entry. | Simple, predictable behavior. |
| Random | Evicts a random entry. | When access patterns are unpredictable. |
| TTL-based | Entries expire after a fixed time-to-live. | Data with known staleness tolerance. |
Cache Invalidation
The hardest problem in caching. When data in the database changes, the corresponding cache entry becomes stale. Strategies include:
- TTL (Time-to-Live): Set an expiration time. After TTL expires, the next read triggers a cache miss and fresh load. Simple but allows staleness up to the TTL duration.
- Event-Driven Invalidation: When data is written to the database, publish an event that triggers cache deletion. Tighter consistency but more complex.
- Versioned Keys: Append a version number to cache keys. When data changes, increment the version. Old keys naturally become unreferenced and expire via TTL.
- Active Invalidation: The write path explicitly deletes or updates the cache entry in the same operation that modifies the database.
Cache Stampede (Thundering Herd)
When a popular cache entry expires, many concurrent requests miss the cache simultaneously and all hit the database at once, potentially overwhelming it. Mitigations:
- Locking: Only one request fetches from the database; others wait for the cache to be repopulated.
- Probabilistic Early Expiration: Randomly refresh cache entries before they actually expire, spreading the load.
- Background Refresh: A background process refreshes popular entries before their TTL expires.
- Stale-While-Revalidate: Serve the stale cached value immediately while fetching a fresh copy in the background.
Distributed Caching
In a multi-server environment, a centralized cache (like Redis or Memcached) ensures all application servers share the same cached data. Key considerations:
- Partitioning: Distribute cache keys across multiple cache nodes using consistent hashing (Chapter 11).
- Replication: Replicate data across cache nodes for availability. Redis Cluster and Redis Sentinel support this.
- Serialization: Data must be serialized to store in cache and deserialized on read. Choose efficient formats (MessagePack, Protobuf) for large objects.
- Memory Management: Monitor memory usage. An out-of-memory cache starts evicting entries aggressively, reducing hit rate.
Redis vs. Memcached
Redis
- Rich data structures (strings, hashes, lists, sets, sorted sets, streams)
- Persistence options (RDB snapshots, AOF log)
- Replication and clustering built-in
- Pub/sub messaging
- Lua scripting
- Single-threaded (6.0+ supports I/O threading)
Memcached
- Simple key-value only
- No persistence
- Multi-threaded; scales well on multi-core
- Lower memory overhead per key
- Simpler operational model
- Better for pure caching (no data structure needs)
Key Takeaways
- Caching can reduce latency from hundreds of milliseconds (database) to sub-millisecond (in-memory cache).
- Cache-aside is the most common and flexible strategy. Use write-through or write-behind when consistency or write performance is critical.
- Always set a TTL. Unbounded cache growth will eventually exhaust memory.
- LRU is the default eviction policy for most use cases.
- Plan for cache invalidation from the start. Retrofitting it is painful.
- Guard against cache stampedes with locking or probabilistic early expiration.
- Prefer Redis for rich data structure needs; Memcached for pure, simple caching at scale.
Chapter Check-Up
Quick quiz to reinforce what you just learned.