Load Balancing in System Design | System Design Course

Why Load Balancing?

Horizontal Scaling: Add more servers to handle more traffic, rather than upgrading a single server.
High Availability: If one server fails, the load balancer routes traffic to healthy servers. Users never see the failure.
Performance: Distribute requests evenly to prevent any single server from becoming a bottleneck.

Load Balancer Distributing Traffic

Layer 4 vs. Layer 7 Load Balancing

Layer 4 (Transport Layer)

Operates on TCP/UDP packets.
Routes based on IP address and port number.
Cannot inspect HTTP content (URLs, headers, cookies).
Very fast: makes decisions without parsing application data.
Example: AWS Network Load Balancer.

Layer 7 (Application Layer)

Operates on HTTP/HTTPS requests.
Can route based on URL path, host header, cookies, query parameters.
Can modify headers, terminate SSL, compress responses.
More flexible but slightly higher latency.
Example: AWS Application Load Balancer, NGINX, HAProxy.

Load Balancing Algorithms

Round Robin

Distributes requests sequentially across servers: Server 1, Server 2, Server 3, Server 1, Server 2, ... Simple and effective when all servers have equal capacity and all requests have similar processing cost.

Weighted Round Robin

Like round robin, but servers with higher weight receive proportionally more requests. Use when servers have different capacities (e.g., a 16-core server gets weight 4, an 8-core server gets weight 2).

Least Connections

Routes each new request to the server with the fewest active connections. Better than round robin when request processing times vary significantly: slow requests naturally cause that server to receive fewer new requests.

Least Response Time

Routes to the server with the lowest average response time and fewest active connections. Combines load awareness with performance awareness.

IP Hash

Hashes the client's IP address to determine which server receives the request. Ensures the same client always reaches the same server (useful for session affinity without sticky cookies). Limitation: uneven distribution if IP ranges are skewed.

Consistent Hashing

Distributes requests using a hash ring. When a server is added or removed, only a fraction of requests are redistributed. Covered fully in Chapter 11.

Algorithm	Best For	Weakness
Round Robin	Equal servers, uniform requests	Ignores server load and request cost
Weighted Round Robin	Heterogeneous server capacities	Weights are static; does not adapt
Least Connections	Variable request durations	Does not account for server capacity
Least Response Time	Latency-sensitive applications	Requires response time tracking
IP Hash	Session affinity needs	Uneven distribution if IPs are clustered
Consistent Hashing	Dynamic server pool	More complex to implement

Health Checks

A load balancer must know which servers are healthy. It performs periodic health checks:

Active Health Checks: The load balancer sends a probe (HTTP GET to /health, TCP connect, or custom script) at regular intervals. If a server fails N consecutive checks, it is removed from the pool.
Passive Health Checks: The load balancer monitors real traffic. If a server returns too many 5xx errors or connection timeouts, it is marked unhealthy.

Most production setups use both. Active checks detect servers that are completely down. Passive checks detect servers that are degraded but still accepting connections.

SSL/TLS Termination

HTTPS encryption and decryption is CPU-intensive. SSL termination at the load balancer means:

The load balancer handles TLS handshakes and decryption.
Traffic between the load balancer and backend servers is unencrypted (HTTP) or re-encrypted.
Backend servers are freed from cryptographic overhead.
Certificate management is centralized.

Security Note

If traffic between the load balancer and backend servers traverses an untrusted network, re-encrypt it (SSL passthrough or SSL bridging). Within a trusted datacenter network, plaintext between LB and backend is common and acceptable.

High Availability for the Load Balancer

The load balancer itself must not be a single point of failure. Solutions:

Active-Passive: A standby load balancer monitors the primary. If the primary fails, the standby takes over the virtual IP address (using VRRP or similar protocols).
Active-Active: Multiple load balancers share the traffic. DNS returns multiple IPs (across LBs), or BGP anycast routes to the nearest one.
Managed Load Balancers: Cloud providers (AWS ALB/NLB, GCP Load Balancer, Azure Load Balancer) handle HA internally. They are distributed across availability zones by default.

Global Server Load Balancing (GSLB)

For multi-region deployments, GSLB distributes traffic across data centers worldwide. It uses DNS-based routing to direct users to the nearest or least-loaded data center.

Geo-based routing: Route users to the geographically closest data center.
Latency-based routing: Route to the data center with the lowest measured latency.
Failover routing: Route to a backup data center if the primary is unhealthy.

Key Takeaways

Load balancers are essential for horizontal scaling and high availability.
Use Layer 4 for speed and simplicity; Layer 7 when you need content-based routing.
Least Connections is generally superior to Round Robin for real-world workloads with variable request durations.
Always implement health checks: both active and passive.
The load balancer itself must be highly available (active-passive, active-active, or managed).
For global presence, use GSLB (DNS-level routing) to direct users to the nearest data center.

Chapter Check-Up

Quick quiz to reinforce what you just learned.

Load Balancing

Why Load Balancing?

Layer 4 vs. Layer 7 Load Balancing

Layer 4 (Transport Layer)

Layer 7 (Application Layer)

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

Least Response Time

IP Hash

Consistent Hashing

Health Checks

SSL/TLS Termination

High Availability for the Load Balancer

Global Server Load Balancing (GSLB)

Key Takeaways

Chapter Check-Up

Practice What You Learned