Guided Lab Brief

Load Balancing & Horizontal Scaling

Add a load balancer to distribute traffic across multiple API servers and handle traffic spikes.

Overview

Add a load balancer to distribute traffic across multiple API servers and handle traffic spikes.

A single server has a limit.

You will build 4 architecture steps that model production dependencies.

You will run 2 failure experiments to observe bottlenecks and recovery behavior.

Success target: Load distributed evenly across 4 servers, spike handled without errors, 99%+ success rate.

Learning Objectives

  • Understand horizontal scaling: add more servers behind a load balancer
  • Know load balancing algorithms (round-robin, least connections)
  • Learned about health checks and failure detection
  • Experienced the difference between 1 and 4 instances during failures

Experiments

  1. Reduce API server instances to 1 to see what happens without horizontal scaling
  2. Set health check interval to 60 seconds to see slow failure detection

Failure Modes to Trigger

  • Trigger: Reduce API server instances to 1 to see what happens without horizontal scaling

    Observe: 1 server at 1000 rps is at 20% of its 5000 rps capacity - it works! But if that one server crashes, your entire app goes down. Zero redundancy. Also, during deployments, you'd have downtime.

  • Trigger: Set health check interval to 60 seconds to see slow failure detection

    Observe: If a server crashes, the load balancer continues sending traffic to it for up to 60 seconds. That's 60 seconds of failing requests. Users get errors. At 250 rps per server, that's 15,000 failed requests.