Guided Lab Brief

Load Balancing & Horizontal Scaling

Add a load balancer to distribute traffic across multiple API servers and handle traffic spikes.

Overview

Add a load balancer to distribute traffic across multiple API servers and handle traffic spikes.

A single server has a limit.

You will build 4 architecture steps that model production dependencies.

You will run 2 failure experiments to observe bottlenecks and recovery behavior.

Success target: Load distributed evenly across 4 servers, spike handled without errors, 99%+ success rate.

Trigger: Reduce API server instances to 1 to see what happens without horizontal scaling
Observe: 1 server at 1000 rps is at 20% of its 5000 rps capacity - it works! But if that one server crashes, your entire app goes down. Zero redundancy. Also, during deployments, you'd have downtime.
Trigger: Set health check interval to 60 seconds to see slow failure detection
Observe: If a server crashes, the load balancer continues sending traffic to it for up to 60 seconds. That's 60 seconds of failing requests. Users get errors. At 250 rps per server, that's 15,000 failed requests.