Blog

Load Balancing Strategies for System Design

March 3, 2026 · Updated March 3, 2026 · 9 min read

How to pick the right load balancing algorithm, where to place balancers, and when a single load balancer becomes the bottleneck.

Definition

Load balancing distributes incoming network traffic across multiple servers so that no single machine becomes a bottleneck, improving throughput, latency, and fault tolerance.

Implementation Checklist

Decide between Layer 4 (TCP/UDP) and Layer 7 (HTTP) balancing based on whether you need content-aware routing.
Enable active health checks with a dedicated /health endpoint rather than relying on passive failure detection.
Use weighted round robin when backend instances have different capacities rather than assuming uniform hardware.
Plan for load balancer redundancy. A single LB is a single point of failure; pair with DNS failover or floating IP.

Placement Matters More Than Algorithm

Most debates about load balancing focus on the algorithm. But placement is the higher-leverage decision. A load balancer between the client and web tier handles different concerns than one between the web tier and a database pool.

At the edge, you need SSL termination and DDoS absorption. Between internal services, you need low-latency health checking and connection pooling. Design each layer independently.

Health Checks: The Silent Reliability Lever

A load balancer that sends traffic to a crashed server is worse than no load balancer at all. Configure active health checks with a short interval (5-10s) and a low failure threshold (2-3 consecutive failures).

Use deep health checks that verify the application can actually serve requests (e.g. test a DB query) rather than just checking if the port is open. Shallow checks miss application-level failures.

Tradeoff Table

Decision	Speed-First Option	Reliability-First Option	Recommended When
Round Robin vs Least Connections	Round robin is simple, predictable, zero state	Least connections adapts to slow backends and variable request cost	Use least connections when request processing times vary widely (e.g. mixed read/write APIs)
Layer 4 vs Layer 7	L4 is faster, lower overhead, protocol-agnostic	L7 enables path-based routing, header inspection, SSL termination	Use L7 when you need sticky sessions, A/B routing, or WebSocket upgrade awareness
Single LB vs Global LB (Anycast/GeoDNS)	Single LB is simpler to operate and debug	Global LB routes users to nearest region, survives regional outages	Add global LB when you serve users in multiple continents or need regional failover

Practice Next

Load Balancing Topic Hub

Definitions, strategies, and implementation patterns for load balancing.

Load Balancing Lab

Practice configuring load balancers and observing traffic distribution in the interactive lab.

Challenges

RideFlow 1 - City Launch
Design a rideshare platform where load balancing is critical for dispatching requests.
Cake Shop 2 - Scaling Up
Scale from one server to many with caching and load balancing under viral traffic.

Frequently Asked Questions

When does a load balancer itself become the bottleneck?

A single reverse-proxy LB tops out around 50-100k concurrent connections depending on hardware. Beyond that, scale horizontally with DNS round robin across multiple LB instances or use a managed cloud load balancer (ALB/NLB) that auto-scales.

Should I terminate SSL at the load balancer or at the backend?

Terminate at the LB in most cases. It simplifies certificate management, reduces backend CPU, and lets you inspect traffic for L7 routing. Use end-to-end TLS only when compliance demands it.

What is the difference between sticky sessions and consistent hashing?

Sticky sessions tie a user to a specific server using a cookie. Consistent hashing maps a key (user ID, session ID) to a server deterministically. Consistent hashing is more resilient to server additions/removals; sticky sessions are simpler but cause uneven load during scale events.