Consistent Hashing: The Art of Load Balancing

Distributed systems are the backbone of today’s internet-based applications, enabling them to handle millions or even billions of requests every day. To ensure high availability, scalability, and fault tolerance, developers often need to distribute their data across multiple servers or nodes. One crucial challenge in this endeavor is efficient data distribution and load balancing. This is where consistent hashing comes into play. In this blog post, we will explore what consistent hashing is, how it works, and why it’s essential in building scalable distributed systems.

Applications of Consistent Hashing

It is used in distributed computing and networking to address problems related to data distribution, load balancing, and fault tolerance. Let’s look at real-world usages to demonstrate how powerful it is.

Distributed Caching: Consistent hashing is widely used in distributed caching systems like Memcached and Redis. It helps distribute cached data across multiple cache nodes, ensuring efficient data retrieval and load balancing.
Content Delivery Networks (CDNs): CDNs like cloudflare use consistent hashing to distribute web content and media files to a network of edge servers. This ensures that users receive content from the nearest server, reducing latency and improving content delivery speed.
Distributed Databases: Distributed databases like Cassandra, Riak, and Amazon DynamoDB employ consistent hashing to partition data across multiple nodes or clusters. This helps achieve horizontal scalability and fault tolerance.
Load Balancers: Load balancers use consistent hashing to distribute incoming requests among backend servers or instances. This helps evenly distribute traffic, prevent overloading of specific servers, and ensure high availability.
Distributed File Systems: Distributed file systems like Hadoop HDFS and Ceph use consistent hashing to distribute file data across a cluster of storage nodes. This allows for efficient data storage and retrieval while accommodating changes in cluster size.
Peer-to-Peer (P2P) Networks: In P2P networks like BitTorrent and distributed hash tables (DHTs), consistent hashing helps route requests and data to the appropriate peers. It enables efficient data sharing and content discovery in decentralized networks.
Database Sharding: In large-scale databases, sharding involves dividing data into smaller subsets (shards) distributed across multiple servers. Consistent hashing simplifies the process of determining which server is responsible for a particular shard.
Token-Based Authentication: Some authentication systems use consistent hashing to distribute authentication tokens across authentication servers. This allows for efficient token validation without the need for centralized storage.
Service Discovery: In microservices architectures, consistent hashing can be used for service discovery. Clients can hash service names or keys to locate the appropriate service instance for communication.
Distributed Task Queues: Systems like RabbitMQ and Apache Kafka use consistent hashing to distribute tasks or messages across worker nodes or consumer groups. This ensures that tasks are processed efficiently and in a distributed manner.

Understanding the Problem

Before diving into the details of consistent hashing, let’s understand the problem it solves. In a distributed system, data is typically divided into partitions or shards, and these shards are distributed across multiple nodes or servers. The challenge is to determine which node should be responsible for handling each shard efficiently. This task becomes even more challenging as the number of nodes or servers increases.

Problem with Traditional Hashing

Traditional hashing techniques can be problematic in this context because they are not inherently designed for scalability. In a fixed hashing scheme, when the number of nodes changes (e.g., when nodes are added or removed), many keys need to be remapped to different nodes. This remapping can be a resource-intensive operation and can lead to uneven distribution of data, potentially causing hotspots on certain nodes.

Note that all key locations changed, not only the ones from the deleted node.

In the typical use case (for example – caching), this would mean that all of a sudden, the keys won’t be found because they won’t yet be present at their new location.

So, most queries will result in misses, and the original data will likely need retrieving again from the source to be rehashed, thus placing a heavy load on the origin server(s) (typically a database). This may very well degrade performance severely and possibly crash the origin servers.

Key Principles of Consistent Hashing

At its core, consistent hashing provides a way to map keys to nodes in a scalable and resilient manner. It achieves this through the following key principles:

Ring-based Structure: In consistent hashing, nodes are organized in a virtual ring, typically represented as a circle. Each node is assigned a unique position on the ring. The keys are also placed on this ring, and their positions are determined through hashing.
Deterministic Hashing: A hash function is used to map both nodes and keys to positions on the ring. This hash function consistently produces the same output for a given input. This deterministic property is crucial for achieving consistency in mapping.
Balanced Distribution: Keys are assigned to the node whose position on the ring is the closest clockwise neighbor to the key’s position. This approach ensures that data distribution is balanced across nodes, minimizing hotspots.
Node Addition/Removal: When a new node is added or an existing node is removed, only a small portion of keys need to be remapped. This minimizes the disruption and resource overhead associated with node changes.

An Example

Hash Ring

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.

Picture this: We’re taking the results of our special math process (the hash) and putting them around the edge of a circle. When we get the smallest possible result, it’s like starting from the very beginning of the circle, at zero degrees. And when we get the biggest possible result (we’ll call it INT_MAX, which is a really big number), it’s like going all the way around the circle to 360 degrees. All the other results we get fit in between those two points on the circle, kind of like ticking numbers on a clock.

Key	Hashcode	Angle(Degree)
“John”	1633428562	32
“Bill”	7594634739	180
“Jane”	5000799124	240
“Steve”	9787173343	96
“Kate”	3421657995	324

We also placed the servers on the edge of the circle. A convenient way of doing this is by hashing the server name (or IP address, or some ID)—as we’d do with any other key—to come up with its angle.

Key	Hashcode	Angle (Degree)
“A”	5572014558	228
“B”	8077113362	108
“C”	2269549488	351

Mapping Objects to Servers

Since we have the keys for both the objects and the servers on the same circle, we may define a simple rule to associate the former with the latter: Each object key will belong in the server whose key is closest, in a counterclockwise direction (or clockwise, depending on the conventions used).

From a programming perspective, we would keep a sorted list of server values (which could be angles or numbers in any real interval), and walk this list (or use a binary search) to find the first server with a value greater than, or equal to, that of the desired key. If no such value is found, we need to wrap around, taking the first one from the list.

Key	Hashcode	Angle (Degree)	Label	Server
“John”	1632929716	32	“C”	C
“Kate”	3421831276	324	“A”	A
“Jane”	5000648311	240	“A”	A
“Bill”	7594873884	180	“B”	B
“Steve”	9786437450	96	“C”	C

So far so good, but still we have a big problem – what if there is an uneven distribution of keys burdening only a particular server?

Virtual Servers

To make sure we spread out object keys evenly among servers, we can use a simple idea: Instead of giving each server just one name (like A, B, or C), we give them multiple names(labels/angles), kind of like nicknames. So, instead of just A, B, and C, we have names like A0, A1, A2… all the way up to A9 for the first server, B0, B1, B2… to B9 for the second server, and so on. These names are placed around a circle.

Now, here’s the clever part: We can decide how many nicknames each server gets(known as the weight of the server) based on how powerful it is. If one server, let’s say B, is twice as strong as the others, we can give it twice as many nicknames (like B0 to B19), which means it will end up holding twice as many objects, on average.

In our example, let’s say all three servers have equal weights, so they each get 10 nicknames (like A0 to A9, B0 to B9, and C0 to C9). This works well when you have three servers. If you have more servers, you might need to give them even more nicknames(labels) to balance things out better.

Key	Hashcode	Angle (Degree)	Label	Server
“John”	1632929716	32	“B2”	B
“Kate”	3421831276	324	“A5”	A
“Jane”	5000648311	240	“C7”	C
“Bill”	7594873884	180	“A4”	A
“Steve”	9786437450	96	“C6”	C

Deleting a Server

So, why is this circle method helpful? Let’s say we remove server C from the picture. To make this change, we simply take away the labels C0 to C9 from the circle. This means the object keys that used to belong to server C now get new labels, like Ax and Bx, which means they’re now assigned to servers A and B.

But what about the other object keys that were originally with servers A and B? Well, here’s the cool part: They stay just the way they are! Removing server C doesn’t affect them at all. So, when we remove a server, its object keys simply get mixed up and assigned to the remaining servers, while the other keys remain exactly where they were.

Adding a Server

Something similar will happen if, instead of removing a server, we add one. If we wanted to add a server D to our example (say, as a replacement for C), we would need to add labels D0 .. D9. The result would be that roughly one-third of the existing keys (all belonging to A or B) would be reassigned to D, and, again, the rest would stay the same.

Time Complexity

In general, only k/N keys need to be remapped when k is the number of keys and N is the number of servers (more specifically, the maximum of the initial and final number of servers).

	Classic hash table	Consistent hashing
add a node	$O(K)$	$O(K/N+\log N)$
remove a node	$O(K)$	$O(K/N+\log N)$
add a key	$O(1)$	$O(\log N)$
remove a key	$O(1)$	$O(\log N)$

The O(K/N) is an average cost for the redistribution of keys and the O(log N) complexity for consistent hashing comes from the fact that a binary search among nodes is required to find the next node on the ring.

Advantages of Consistent Hashing

Load balancing
Scalability: Consistent hashing is extremely scalable with little to no influence on the performance of the entire system.
Minimal Remapping: Consistent hashing reduces the number of keys that must be remapped when a node is added or removed.
Increased Failure Tolerance: Consistent hashing makes data always accessible and current, even in the case of node failures.
Simplified Operations: The act of adding or removing nodes from the network is made easier by consistent hashing, which makes it simpler to administer and maintain a sizable and intricate distributed system.

Disadvantages of Consistent Hashing

Hash Function Complexity: The hash function must produce a unique value for each key and be deterministic in order to be useful. The system’s overall effectiveness and efficiency may be affected by how complicated the hash function is.
Performance Cost: The computing resources needed to map keys to nodes, replicate keys, and remap keys in the event of node additions or removals can result in some performance overhead when using consistent hashing.
Lack of Flexibility: In some circumstances, the system’s ability to adapt to changing requirements or shifting network conditions may be constrained by the rigid limits of consistent hashing.
High Resource Use: As nodes are added to or deleted from the network, consistent hashing may occasionally result in high resource utilization
The complexity of Management: Managing and maintaining a distributed system that uses consistent hashing can be difficult and demanding, and it often calls for particular expertise and abilities.

Read our other popular content:

How DNS Translates Websites – Inside the DNS Machinery

Rate Limiter: Defend Your APIs against DDoS Attacks