Guided LabsChallengesPricingDesign Lab
CoursesTopicsQuizzes
DocsBlogSolutions
LoginSignup
Menu
Guided LabsChallengesPricingDesign Lab
DocsBlogSolutions
LoginSignup

Blog

Rate Limiting Algorithms for System Design

March 17, 2026 · Updated March 17, 2026 · 8 min read

Token bucket, sliding window, and fixed window compared. Where to enforce limits and how to communicate them to clients without breaking integrations.

Definition

Rate limiting controls the number of requests a client can make to a service within a time window, protecting against abuse, ensuring fair usage, and preventing resource exhaustion.

Implementation Checklist

  • Choose rate limiting granularity: per-user, per-IP, per-API-key, or per-endpoint. Most systems need per-user limits with a global safety net.
  • Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle before hitting 429s.
  • Store rate limit counters in a fast in-memory store (Redis) rather than the application database. Latency on the rate limit check must be sub-millisecond.
  • Implement a separate, higher rate limit for authenticated users vs anonymous traffic to reward sign-ups without penalizing legitimate usage.

Rate Limiting Is a Product Decision, Not Just a Technical One

The right rate limit depends on your product, not just your infrastructure capacity. A social media API and a payment API need fundamentally different limits, even if they run on the same hardware.

Engage product and business teams when setting limits. Too aggressive and you frustrate power users; too lenient and a single bad actor can degrade the experience for everyone.

Graceful Degradation Over Hard Rejection

Instead of immediately returning 429, consider degrading the response: serve cached data, reduce response detail, or queue the request for delayed processing.

For critical paths like authentication or payment, never rate limit so aggressively that legitimate users get locked out during traffic spikes. Use adaptive limits that relax during known peak periods.

Tradeoff Table

DecisionSpeed-First OptionReliability-First OptionRecommended When
Token Bucket vs Sliding WindowToken bucket is simple, allows burst traffic up to bucket sizeSliding window provides smoother enforcement and prevents burst spikes at window boundariesUse token bucket for APIs where occasional bursts are acceptable; sliding window for strict per-second enforcement
Fixed Window vs Sliding Window LogFixed window uses a single counter per window, O(1) memorySliding window log tracks individual request timestamps, exact enforcementUse fixed window for high-throughput, low-precision needs; sliding window log when accuracy matters more than memory
Client-side vs Server-side LimitingClient-side limiting reduces unnecessary network callsServer-side is authoritative and cannot be bypassed by malicious clientsAlways enforce server-side. Add client-side as a courtesy to reduce 429 responses for well-behaved clients

Practice Next

Rate Limiting Topic Hub

Algorithms, implementation patterns, and production considerations for rate limiting.

API Gateway and Auth Lab

Practice configuring rate limits alongside authentication in the API gateway lab.

Challenges

  • Flash Sale Challenge

    Design a flash sale system where rate limiting prevents inventory overselling under burst traffic.

  • Payment Gateway 1

    Build a payment gateway where rate limiting protects against fraudulent transaction floods.

Newsletter CTA

Join the SystemForces newsletter for practical architecture and distributed systems notes.

Get weekly system design breakdowns

Frequently Asked Questions

Should I rate limit internal service-to-service calls?

Yes, but with higher limits. Internal rate limiting prevents cascading failures where one runaway service overwhelms another. Use circuit breakers alongside rate limits for defense in depth.

How do I handle rate limiting in a distributed system with multiple servers?

Use a centralized counter store (Redis) that all servers check. For higher availability, use local in-memory counters with periodic sync, accepting slight over-admission at window boundaries.

What HTTP status code should I return when rate limited?

Return 429 Too Many Requests with a Retry-After header indicating when the client can retry. Never return 403 or 503 for rate limiting; those codes have different semantics.