Blog
Rate Limiting Algorithms for System Design
March 17, 2026 · Updated March 17, 2026 · 8 min read
Token bucket, sliding window, and fixed window compared. Where to enforce limits and how to communicate them to clients without breaking integrations.
Definition
Rate limiting controls the number of requests a client can make to a service within a time window, protecting against abuse, ensuring fair usage, and preventing resource exhaustion.
Implementation Checklist
- Choose rate limiting granularity: per-user, per-IP, per-API-key, or per-endpoint. Most systems need per-user limits with a global safety net.
- Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle before hitting 429s.
- Store rate limit counters in a fast in-memory store (Redis) rather than the application database. Latency on the rate limit check must be sub-millisecond.
- Implement a separate, higher rate limit for authenticated users vs anonymous traffic to reward sign-ups without penalizing legitimate usage.
Rate Limiting Is a Product Decision, Not Just a Technical One
The right rate limit depends on your product, not just your infrastructure capacity. A social media API and a payment API need fundamentally different limits, even if they run on the same hardware.
Engage product and business teams when setting limits. Too aggressive and you frustrate power users; too lenient and a single bad actor can degrade the experience for everyone.
Graceful Degradation Over Hard Rejection
Instead of immediately returning 429, consider degrading the response: serve cached data, reduce response detail, or queue the request for delayed processing.
For critical paths like authentication or payment, never rate limit so aggressively that legitimate users get locked out during traffic spikes. Use adaptive limits that relax during known peak periods.
Tradeoff Table
| Decision | Speed-First Option | Reliability-First Option | Recommended When |
|---|---|---|---|
| Token Bucket vs Sliding Window | Token bucket is simple, allows burst traffic up to bucket size | Sliding window provides smoother enforcement and prevents burst spikes at window boundaries | Use token bucket for APIs where occasional bursts are acceptable; sliding window for strict per-second enforcement |
| Fixed Window vs Sliding Window Log | Fixed window uses a single counter per window, O(1) memory | Sliding window log tracks individual request timestamps, exact enforcement | Use fixed window for high-throughput, low-precision needs; sliding window log when accuracy matters more than memory |
| Client-side vs Server-side Limiting | Client-side limiting reduces unnecessary network calls | Server-side is authoritative and cannot be bypassed by malicious clients | Always enforce server-side. Add client-side as a courtesy to reduce 429 responses for well-behaved clients |
Practice Next
Rate Limiting Topic Hub
Algorithms, implementation patterns, and production considerations for rate limiting.
API Gateway and Auth Lab
Practice configuring rate limits alongside authentication in the API gateway lab.
Challenges
- Flash Sale Challenge
Design a flash sale system where rate limiting prevents inventory overselling under burst traffic.
- Payment Gateway 1
Build a payment gateway where rate limiting protects against fraudulent transaction floods.
Newsletter CTA
Join the SystemForces newsletter for practical architecture and distributed systems notes.
Get weekly system design breakdownsFrequently Asked Questions
Should I rate limit internal service-to-service calls?
Yes, but with higher limits. Internal rate limiting prevents cascading failures where one runaway service overwhelms another. Use circuit breakers alongside rate limits for defense in depth.
How do I handle rate limiting in a distributed system with multiple servers?
Use a centralized counter store (Redis) that all servers check. For higher availability, use local in-memory counters with periodic sync, accepting slight over-admission at window boundaries.
What HTTP status code should I return when rate limited?
Return 429 Too Many Requests with a Retry-After header indicating when the client can retry. Never return 403 or 503 for rate limiting; those codes have different semantics.