Topic Hub
Caching in System Design
Caching is usually the highest-leverage optimization in system design because it changes both latency and cost in one move. A good cache strategy shortens hot-path response time, protects the database under spikes, and creates room for growth without immediate infra expansion.
Start Practicing: Supercharge with CachingWhat It Is
Caching is the practice of storing frequently requested data in a faster layer than the source of truth. In most production systems, that means in-memory stores like Redis, CDN edge nodes, browser caches, and application-side object caches. The core objective is simple: serve repeated reads closer to users while preserving correctness constraints for writes and invalidation.
When to Use It
Use caching when read frequency far exceeds write frequency on the same data. Product catalogs, user profile metadata, configuration lookups, and permission checks are common candidates because the same records are requested hundreds or thousands of times between updates.
Use caching to protect downstream systems under load spikes. Even short TTLs on frequently requested endpoints reduce query amplification during traffic surges, flash sales, or viral events.
Use caching at the edge when geographic latency is a constraint. CDN layers eliminate round-trip penalties for static and semi-dynamic content, keeping time-to-first-byte low for global audiences.
Why Caching Matters
At scale, many workloads are read-heavy. Product pages, user profiles, and frequently accessed metadata are requested far more often than they change. Without caching, every repeated read hits origin storage, driving up p99 latency and increasing the blast radius of normal traffic bursts. Caches absorb this repetition and flatten demand on expensive backend systems.
Caching is also a resilience primitive. During an upstream incident, stale-but-safe cached responses can keep core paths usable while teams recover a database replica, a search cluster, or an internal API dependency. This graceful degradation is often the difference between a partial incident and a full outage.
From a cost perspective, cache hit rate directly changes infrastructure economics. A service that moves from 40 percent to 85 percent cache hit rate usually defers database scaling milestones, lowers cross-region query traffic, and reduces CPU burn in app tiers that no longer serialize the same records repeatedly.
Core Concepts and Mental Models
Think in cache tiers, not one cache. Browser and CDN layers handle static and semi-static content close to the edge. Application caches and distributed in-memory stores handle dynamic objects. Database page cache is a separate internal layer. Each tier has a different consistency model and ownership boundary, so your invalidation plan must be explicit per layer.
Cache-aside remains the most common pattern for read-heavy APIs. The service checks cache first, falls back to origin on miss, then repopulates cache. Write-through and write-behind patterns are useful when write paths are predictable, but they create coupling that teams often underestimate during schema evolution and partial outages.
TTL is a product decision as much as an infrastructure one. Longer TTL improves hit rate but increases staleness risk. Short TTL improves freshness but can reintroduce origin hotspots. Good systems define data classes, such as static config, profile metadata, and real-time counters, each with a justified freshness budget tied to user impact.
Key Tradeoffs
| Decision | Upside | Downside | Guidance |
|---|---|---|---|
| TTL length | Longer TTL improves hit rate and reduces origin load | Increases staleness risk for time-sensitive data | Define data freshness classes and assign TTL per class based on user impact |
| Cache-aside vs write-through | Cache-aside is simpler and avoids write-path coupling | Write-through keeps cache warm but couples write latency to cache availability | Use cache-aside for read-heavy APIs; write-through only when miss cost is extreme |
| Local vs distributed cache | Local caches have zero network latency | Distributed caches share state across instances and survive restarts | Use local for hot immutable data; distributed for shared mutable state |
Common Mistakes
- Caching before defining invalidation: teams often add cache layers and defer correctness rules, leading to stale reads in high-impact paths such as pricing or entitlement checks.
- Treating cache as source of truth: restoring from cache during incidents can hide data integrity issues. Always persist writes to durable storage first.
- Fragmented key design: over-varied keys waste memory, increase eviction noise, and reduce hit rate. Normalize keys around access patterns and use explicit TTL classes.
Implementation Playbook
Start by profiling request distribution before adding cache layers. Identify top keys, repeated query patterns, and endpoints with high fan-in. Add caching first where repetition is obvious and correctness impact is low. This sequence gives fast wins and keeps operational risk manageable while teams tune observability and invalidation logic.
Instrument cache hit rate, origin fallback count, key cardinality, and eviction churn from day one. A cache with poor visibility becomes a silent failure amplifier. Alerting on miss spikes and hot-key concentration helps teams catch issues before customers notice rising latency or downstream saturation.
Plan cache stampede protection early. Use request coalescing, jittered TTL, background refresh, and fallback policies for overloaded backends. Systems that skip this step often fail exactly when traffic spikes, because mass expirations trigger synchronized origin reads that erase the intended benefits of the cache layer.
Practice Path for Caching
Course Chapters
- Caching
Cache hierarchy, eviction policy, and invalidation strategy fundamentals.
- Database Scaling
How cache layers and read scaling reduce pressure on primary databases.
- CDNs and Edge Computing
Global cache design for static and semi-dynamic workloads.
Guided Labs
- Supercharge with Caching
Add a Redis cache to reduce database load and dramatically improve response times.
- Database Replication & Read Scaling
Add read replicas to scale database reads and add failover for high availability.
- CDN & Edge: Global Content Delivery
Add a CDN to serve static assets from edge locations worldwide, dramatically reducing latency for global users.
Challenge Progression
- 1.Social Feed 1 - MVP LaunchSocial Feed · easy
- 2.Feature Flag ServiceStarter · easy
- 3.Gaming LeaderboardStarter · easy
- 4.Personal Blog PlatformStarter · easy
- 5.Poll & Survey ToolStarter · easy
- 6.QR Code Generator APIStarter · easy
Public Solution Walkthroughs
- Social Feed 1 - MVP LaunchFull solution walkthrough with architecture breakdown
- Feature Flag ServiceFull solution walkthrough with architecture breakdown
- Gaming LeaderboardFull solution walkthrough with architecture breakdown
- Personal Blog PlatformFull solution walkthrough with architecture breakdown
Related Articles
CDN in System Design: When and How to Use a Content Delivery Network
Learn when a CDN actually helps, how edge caching works, and the cache-key and purge decisions that matter in real architectures.
9 min read
Cache Invalidation That Does Not Burn Your Team
A practical pattern for choosing TTLs, write paths, and invalidation triggers without turning cache logic into a production risk.
8 min read
Frequently Asked Questions
When should I add caching in a new system?
Add it when repeated reads dominate latency or backend load, not because caching is fashionable. Baseline first, then introduce cache in one high-impact path and measure hit rate, p95 latency, and origin load reduction.
Is Redis always the right cache choice?
Redis is common, but not universal. CDN and browser caches might solve your problem first. For local process memoization, in-memory app caches can be enough. Choose based on consistency needs, scale profile, and operational budget.
How do I avoid stale cache bugs?
Define data freshness classes, use event-driven invalidation where possible, and keep TTL explicit per key type. Track stale read incidents as first-class metrics so teams can tune correctness and performance together.
What metrics matter most for cache operations?
Watch hit rate, eviction rate, key cardinality, cache memory headroom, and origin fallback latency. Pair those with endpoint-level latency and error metrics to confirm the cache is improving the user experience rather than masking bottlenecks.