Load Balancing

A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much load. It is the critical component that enables horizontal scaling, provides redundancy, and keeps systems responsive under high traffic.

Why Load Balancing?

  • Horizontal Scaling: Add more servers to handle more traffic, rather than upgrading a single server.
  • High Availability: If one server fails, the load balancer routes traffic to healthy servers. Users never see the failure.
  • Performance: Distribute requests evenly to prevent any single server from becoming a bottleneck.
Load Balancer Distributing Traffic
Client A Client B Client C Load Balancer Server 1 Server 2 Server 3

Layer 4 vs. Layer 7 Load Balancing

Layer 4 (Transport Layer)

  • Operates on TCP/UDP packets.
  • Routes based on IP address and port number.
  • Cannot inspect HTTP content (URLs, headers, cookies).
  • Very fast: makes decisions without parsing application data.
  • Example: AWS Network Load Balancer.

Layer 7 (Application Layer)

  • Operates on HTTP/HTTPS requests.
  • Can route based on URL path, host header, cookies, query parameters.
  • Can modify headers, terminate SSL, compress responses.
  • More flexible but slightly higher latency.
  • Example: AWS Application Load Balancer, NGINX, HAProxy.

Load Balancing Algorithms

Round Robin

Distributes requests sequentially across servers: Server 1, Server 2, Server 3, Server 1, Server 2, ... Simple and effective when all servers have equal capacity and all requests have similar processing cost.

Weighted Round Robin

Like round robin, but servers with higher weight receive proportionally more requests. Use when servers have different capacities (e.g., a 16-core server gets weight 4, an 8-core server gets weight 2).

Least Connections

Routes each new request to the server with the fewest active connections. Better than round robin when request processing times vary significantly: slow requests naturally cause that server to receive fewer new requests.

Least Response Time

Routes to the server with the lowest average response time and fewest active connections. Combines load awareness with performance awareness.

IP Hash

Hashes the client's IP address to determine which server receives the request. Ensures the same client always reaches the same server (useful for session affinity without sticky cookies). Limitation: uneven distribution if IP ranges are skewed.

Consistent Hashing

Distributes requests using a hash ring. When a server is added or removed, only a fraction of requests are redistributed. Covered fully in Chapter 11.

AlgorithmBest ForWeakness
Round RobinEqual servers, uniform requestsIgnores server load and request cost
Weighted Round RobinHeterogeneous server capacitiesWeights are static; does not adapt
Least ConnectionsVariable request durationsDoes not account for server capacity
Least Response TimeLatency-sensitive applicationsRequires response time tracking
IP HashSession affinity needsUneven distribution if IPs are clustered
Consistent HashingDynamic server poolMore complex to implement

Health Checks

A load balancer must know which servers are healthy. It performs periodic health checks:

  • Active Health Checks: The load balancer sends a probe (HTTP GET to /health, TCP connect, or custom script) at regular intervals. If a server fails N consecutive checks, it is removed from the pool.
  • Passive Health Checks: The load balancer monitors real traffic. If a server returns too many 5xx errors or connection timeouts, it is marked unhealthy.

Most production setups use both. Active checks detect servers that are completely down. Passive checks detect servers that are degraded but still accepting connections.

SSL/TLS Termination

HTTPS encryption and decryption is CPU-intensive. SSL termination at the load balancer means:

  • The load balancer handles TLS handshakes and decryption.
  • Traffic between the load balancer and backend servers is unencrypted (HTTP) or re-encrypted.
  • Backend servers are freed from cryptographic overhead.
  • Certificate management is centralized.
Security Note
If traffic between the load balancer and backend servers traverses an untrusted network, re-encrypt it (SSL passthrough or SSL bridging). Within a trusted datacenter network, plaintext between LB and backend is common and acceptable.

High Availability for the Load Balancer

The load balancer itself must not be a single point of failure. Solutions:

  • Active-Passive: A standby load balancer monitors the primary. If the primary fails, the standby takes over the virtual IP address (using VRRP or similar protocols).
  • Active-Active: Multiple load balancers share the traffic. DNS returns multiple IPs (across LBs), or BGP anycast routes to the nearest one.
  • Managed Load Balancers: Cloud providers (AWS ALB/NLB, GCP Load Balancer, Azure Load Balancer) handle HA internally. They are distributed across availability zones by default.

Global Server Load Balancing (GSLB)

For multi-region deployments, GSLB distributes traffic across data centers worldwide. It uses DNS-based routing to direct users to the nearest or least-loaded data center.

  • Geo-based routing: Route users to the geographically closest data center.
  • Latency-based routing: Route to the data center with the lowest measured latency.
  • Failover routing: Route to a backup data center if the primary is unhealthy.

Key Takeaways

  • Load balancers are essential for horizontal scaling and high availability.
  • Use Layer 4 for speed and simplicity; Layer 7 when you need content-based routing.
  • Least Connections is generally superior to Round Robin for real-world workloads with variable request durations.
  • Always implement health checks: both active and passive.
  • The load balancer itself must be highly available (active-passive, active-active, or managed).
  • For global presence, use GSLB (DNS-level routing) to direct users to the nearest data center.

Chapter Check-Up

Quick quiz to reinforce what you just learned.

๐Ÿงช

Practice What You Learned

Configure a load balancer with health checks and see traffic distribution in our guided lab.

Start Guided Lab โ†’