Why Load Balancing?
- Horizontal Scaling: Add more servers to handle more traffic, rather than upgrading a single server.
- High Availability: If one server fails, the load balancer routes traffic to healthy servers. Users never see the failure.
- Performance: Distribute requests evenly to prevent any single server from becoming a bottleneck.
Layer 4 vs. Layer 7 Load Balancing
Layer 4 (Transport Layer)
- Operates on TCP/UDP packets.
- Routes based on IP address and port number.
- Cannot inspect HTTP content (URLs, headers, cookies).
- Very fast: makes decisions without parsing application data.
- Example: AWS Network Load Balancer.
Layer 7 (Application Layer)
- Operates on HTTP/HTTPS requests.
- Can route based on URL path, host header, cookies, query parameters.
- Can modify headers, terminate SSL, compress responses.
- More flexible but slightly higher latency.
- Example: AWS Application Load Balancer, NGINX, HAProxy.
Load Balancing Algorithms
Round Robin
Distributes requests sequentially across servers: Server 1, Server 2, Server 3, Server 1, Server 2, ... Simple and effective when all servers have equal capacity and all requests have similar processing cost.
Weighted Round Robin
Like round robin, but servers with higher weight receive proportionally more requests. Use when servers have different capacities (e.g., a 16-core server gets weight 4, an 8-core server gets weight 2).
Least Connections
Routes each new request to the server with the fewest active connections. Better than round robin when request processing times vary significantly: slow requests naturally cause that server to receive fewer new requests.
Least Response Time
Routes to the server with the lowest average response time and fewest active connections. Combines load awareness with performance awareness.
IP Hash
Hashes the client's IP address to determine which server receives the request. Ensures the same client always reaches the same server (useful for session affinity without sticky cookies). Limitation: uneven distribution if IP ranges are skewed.
Consistent Hashing
Distributes requests using a hash ring. When a server is added or removed, only a fraction of requests are redistributed. Covered fully in Chapter 11.
| Algorithm | Best For | Weakness |
|---|---|---|
| Round Robin | Equal servers, uniform requests | Ignores server load and request cost |
| Weighted Round Robin | Heterogeneous server capacities | Weights are static; does not adapt |
| Least Connections | Variable request durations | Does not account for server capacity |
| Least Response Time | Latency-sensitive applications | Requires response time tracking |
| IP Hash | Session affinity needs | Uneven distribution if IPs are clustered |
| Consistent Hashing | Dynamic server pool | More complex to implement |
Health Checks
A load balancer must know which servers are healthy. It performs periodic health checks:
- Active Health Checks: The load balancer sends a probe (HTTP GET to /health, TCP connect, or custom script) at regular intervals. If a server fails N consecutive checks, it is removed from the pool.
- Passive Health Checks: The load balancer monitors real traffic. If a server returns too many 5xx errors or connection timeouts, it is marked unhealthy.
Most production setups use both. Active checks detect servers that are completely down. Passive checks detect servers that are degraded but still accepting connections.
SSL/TLS Termination
HTTPS encryption and decryption is CPU-intensive. SSL termination at the load balancer means:
- The load balancer handles TLS handshakes and decryption.
- Traffic between the load balancer and backend servers is unencrypted (HTTP) or re-encrypted.
- Backend servers are freed from cryptographic overhead.
- Certificate management is centralized.
High Availability for the Load Balancer
The load balancer itself must not be a single point of failure. Solutions:
- Active-Passive: A standby load balancer monitors the primary. If the primary fails, the standby takes over the virtual IP address (using VRRP or similar protocols).
- Active-Active: Multiple load balancers share the traffic. DNS returns multiple IPs (across LBs), or BGP anycast routes to the nearest one.
- Managed Load Balancers: Cloud providers (AWS ALB/NLB, GCP Load Balancer, Azure Load Balancer) handle HA internally. They are distributed across availability zones by default.
Global Server Load Balancing (GSLB)
For multi-region deployments, GSLB distributes traffic across data centers worldwide. It uses DNS-based routing to direct users to the nearest or least-loaded data center.
- Geo-based routing: Route users to the geographically closest data center.
- Latency-based routing: Route to the data center with the lowest measured latency.
- Failover routing: Route to a backup data center if the primary is unhealthy.
Key Takeaways
- Load balancers are essential for horizontal scaling and high availability.
- Use Layer 4 for speed and simplicity; Layer 7 when you need content-based routing.
- Least Connections is generally superior to Round Robin for real-world workloads with variable request durations.
- Always implement health checks: both active and passive.
- The load balancer itself must be highly available (active-passive, active-active, or managed).
- For global presence, use GSLB (DNS-level routing) to direct users to the nearest data center.
Chapter Check-Up
Quick quiz to reinforce what you just learned.