Consistent Hashing Explained | System Design Course

The Problem with Simple Hashing

With simple hash-based distribution, you assign a key to a server using server = hash(key) % N, where N is the number of servers. This works well until N changes.

If you add or remove a server (N changes), nearly every key maps to a different server. This causes a massive cache miss storm: all cached data effectively becomes invalid, and every request hits the database simultaneously.

Example

With 4 servers and 1 million keys, changing to 5 servers remaps approximately 80% of all keys. That is 800,000 cache misses all at once. For a high-traffic system, this can cause a cascading failure.

The Hash Ring

Consistent hashing maps both keys and nodes onto a circular hash space (a "ring" from 0 to 2^32 - 1). Each node is placed on the ring at the position determined by hashing its identifier. Each key is placed on the ring by hashing the key, then walking clockwise to find the first node: that is the node responsible for the key.

Consistent Hash Ring

Adding or Removing a Node

When a new node is added to the ring, only the keys between the new node and its predecessor (counter-clockwise neighbor) are reassigned. All other keys remain on their current nodes.

When a node is removed, only its keys move to the next clockwise node. The rest of the ring is unaffected.

Impact

With N nodes, adding or removing one node only remaps approximately 1/N of the keys. With 10 nodes, only ~10% of keys are affected. Compare this to simple hashing, where ~90% would be remapped.

Virtual Nodes (Vnodes)

With just a few physical nodes, the distribution can be uneven: one node might get a disproportionately large arc of the ring. Virtual nodes solve this.

Instead of mapping each physical node to one position on the ring, you create multiple "virtual" copies (e.g., 100-200 per physical node). Each virtual node maps to a different position on the ring but all point back to the same physical machine.

Benefits of Virtual Nodes

Even distribution: With many points on the ring, the arc lengths become approximately equal. Data is distributed more uniformly.
Smoother rebalancing: When a physical node is added, its virtual copies are spread across the ring, absorbing keys from multiple other nodes instead of just one.
Heterogeneous hardware: A more powerful server can be assigned more virtual nodes, proportionally receiving more data.

Real-World Usage

System	How It Uses Consistent Hashing
Amazon DynamoDB	Partitions data across nodes. Each key is assigned to a node using consistent hashing with virtual nodes.
Apache Cassandra	Distributes data using a token ring (consistent hash ring). Each node is responsible for a range of tokens.
Memcached / Redis Cluster	Distributes cache keys across cache nodes. Adding a node only invalidates ~1/N of the cache.
Load Balancers	Some LBs use consistent hashing to route requests to the same backend for session affinity.
Akamai CDN	Originally developed consistent hashing to map web content to edge servers.

Implementation Sketch

Pseudocodeclass ConsistentHash:
    ring = sorted_map<int, string>  # hash_value -> node_id
    replicas = 150                    # virtual nodes per physical node

    add_node(node_id):
        for i in 0..replicas:
            hash_val = hash(node_id + ":" + i)
            ring[hash_val] = node_id

    remove_node(node_id):
        for i in 0..replicas:
            hash_val = hash(node_id + ":" + i)
            ring.remove(hash_val)

    get_node(key):
        hash_val = hash(key)
        # Find first ring position >= hash_val (clockwise)
        node_id = ring.ceiling(hash_val)
        if node_id is None:
            node_id = ring.first()   # Wrap around
        return node_id

Key Takeaways

Simple hash-mod distribution causes massive data redistribution when nodes change. Consistent hashing limits redistribution to ~1/N of keys.
The hash ring maps both keys and nodes to the same circular space. Keys walk clockwise to find their node.
Virtual nodes are essential for even distribution and smooth rebalancing. Use 100-200 virtual nodes per physical node.
Consistent hashing is used in DynamoDB, Cassandra, CDNs, distributed caches, and load balancers.
Understand this concept thoroughly: it appears in nearly every system design interview involving distributed systems.

Consistent Hashing

The Problem with Simple Hashing

The Hash Ring

Adding or Removing a Node

Virtual Nodes (Vnodes)

Benefits of Virtual Nodes

Real-World Usage

Implementation Sketch

Key Takeaways

Chapter Check-Up

Practice What You Learned