Blog
Consistent Hashing Explained: How It Works in Distributed Systems
March 6, 2026 · Updated March 6, 2026 · 8 min read
Understand consistent hashing from first principles: why naive modular hashing breaks, how the hash ring works, and when to use virtual nodes.
Definition
Consistent hashing is a distributed hashing scheme that minimizes key reassignment when nodes join or leave, making it essential for scalable caches, databases, and load balancers.
Implementation Checklist
- Explain why modular hashing (key % N) breaks when N changes before introducing consistent hashing.
- Draw the hash ring and show how keys map to the next clockwise node.
- Add virtual nodes to explain how consistent hashing achieves even load distribution.
- Discuss rebalancing: when a node leaves, only its keys move to the next node, not all keys.
Why Modular Hashing Breaks at Scale
With modular hashing (key % N), adding or removing a single server remaps nearly every key. For a cache cluster, this means a near-total cache miss storm on any topology change.
Consistent hashing solves this by mapping both keys and nodes onto a shared ring. When a node changes, only the keys in its range move. Everything else stays put.
Virtual Nodes and Load Balance
A naive hash ring assigns each physical node one position, which creates uneven key distribution. Virtual nodes give each physical node multiple positions on the ring, spreading load more evenly.
In interviews, mentioning virtual nodes shows you understand the practical limitations of the basic algorithm and how production systems address them.
Interview Walkthrough Pattern
Start with the problem: why does key % N break? Draw the ring. Place 3 nodes. Hash a few keys and show which node owns them. Remove a node and show only adjacent keys move. Add virtual nodes and explain the improvement.
Common follow-ups: How does this apply to database sharding? How do you handle hot keys even with consistent hashing? Prepare answers that reference monitoring and key-splitting strategies.
Tradeoff Table
| Decision | Speed-First Option | Reliability-First Option | Recommended When |
|---|---|---|---|
| Modular hashing vs Consistent hashing | Modular hashing is simpler to implement and reason about for fixed cluster sizes. | Consistent hashing minimizes disruption when nodes are added or removed. | Use consistent hashing whenever the node count can change, which is almost every distributed system. |
| Few virtual nodes vs Many virtual nodes | Fewer virtual nodes reduce memory and lookup overhead. | More virtual nodes improve load balance across heterogeneous hardware. | Start with 100-200 virtual nodes per physical node. Tune based on observed load skew. |
| Hash ring vs Rendezvous hashing | Hash ring lookup is O(log N) with sorted ring; rendezvous is O(N) per key. | Rendezvous hashing is simpler to implement and avoids ring management complexity. | Use hash ring for large clusters. Consider rendezvous hashing for small, stable node sets. |
Practice Next
Consistency Topic Hub
Explore consistency models, hashing strategies, and distributed coordination patterns.
Database Replication Lab
Practice replication and partitioning patterns that rely on consistent hashing.
Challenges
- Cake Shop 3 - Going International
Apply consistent hashing concepts to multi-region data distribution.
- Design Amazon
Design a large-scale e-commerce system where consistent hashing enables cache and shard scaling.
Newsletter CTA
Join the SystemForces newsletter for practical distributed systems notes every week.
Get weekly system design breakdownsFrequently Asked Questions
Where is consistent hashing used in practice?
It is used in distributed caches (Memcached, Redis Cluster), databases (DynamoDB, Cassandra), CDNs, and load balancers to distribute keys or requests across nodes.
What happens when a node fails in a consistent hash ring?
Only the keys assigned to the failed node move to the next node on the ring. All other key-to-node mappings remain stable, minimizing cache misses and data movement.
How many virtual nodes should I use?
100-200 per physical node is a common starting point. More virtual nodes give better balance but use more memory for the ring data structure.