Blog

Rate Limiting Algorithms for System Design

March 17, 2026 · Updated March 17, 2026 · 8 min read

Token bucket, sliding window, and fixed window compared. Where to enforce limits and how to communicate them to clients without breaking integrations.

Definition

Rate limiting controls the number of requests a client can make to a service within a time window, protecting against abuse, ensuring fair usage, and preventing resource exhaustion.

Implementation Checklist

Choose rate limiting granularity: per-user, per-IP, per-API-key, or per-endpoint. Most systems need per-user limits with a global safety net.
Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle before hitting 429s.
Store rate limit counters in a fast in-memory store (Redis) rather than the application database. Latency on the rate limit check must be sub-millisecond.
Implement a separate, higher rate limit for authenticated users vs anonymous traffic to reward sign-ups without penalizing legitimate usage.

Rate Limiting Is a Product Decision, Not Just a Technical One

The right rate limit depends on your product, not just your infrastructure capacity. A social media API and a payment API need fundamentally different limits, even if they run on the same hardware.

Engage product and business teams when setting limits. Too aggressive and you frustrate power users; too lenient and a single bad actor can degrade the experience for everyone.

Graceful Degradation Over Hard Rejection

Instead of immediately returning 429, consider degrading the response: serve cached data, reduce response detail, or queue the request for delayed processing.

For critical paths like authentication or payment, never rate limit so aggressively that legitimate users get locked out during traffic spikes. Use adaptive limits that relax during known peak periods.

Tradeoff Table

Decision	Speed-First Option	Reliability-First Option	Recommended When
Token Bucket vs Sliding Window	Token bucket is simple, allows burst traffic up to bucket size	Sliding window provides smoother enforcement and prevents burst spikes at window boundaries	Use token bucket for APIs where occasional bursts are acceptable; sliding window for strict per-second enforcement
Fixed Window vs Sliding Window Log	Fixed window uses a single counter per window, O(1) memory	Sliding window log tracks individual request timestamps, exact enforcement	Use fixed window for high-throughput, low-precision needs; sliding window log when accuracy matters more than memory
Client-side vs Server-side Limiting	Client-side limiting reduces unnecessary network calls	Server-side is authoritative and cannot be bypassed by malicious clients	Always enforce server-side. Add client-side as a courtesy to reduce 429 responses for well-behaved clients

Practice Next

Rate Limiting Topic Hub

Algorithms, implementation patterns, and production considerations for rate limiting.

API Gateway and Auth Lab

Practice configuring rate limits alongside authentication in the API gateway lab.

Challenges

Flash Sale Challenge
Design a flash sale system where rate limiting prevents inventory overselling under burst traffic.
Payment Gateway 1
Build a payment gateway where rate limiting protects against fraudulent transaction floods.

Frequently Asked Questions

Should I rate limit internal service-to-service calls?

Yes, but with higher limits. Internal rate limiting prevents cascading failures where one runaway service overwhelms another. Use circuit breakers alongside rate limits for defense in depth.

How do I handle rate limiting in a distributed system with multiple servers?

Use a centralized counter store (Redis) that all servers check. For higher availability, use local in-memory counters with periodic sync, accepting slight over-admission at window boundaries.

What HTTP status code should I return when rate limited?

Return 429 Too Many Requests with a Retry-After header indicating when the client can retry. Never return 403 or 503 for rate limiting; those codes have different semantics.

Implementation Checklist

Choose rate limiting granularity: per-user, per-IP, per-API-key, or per-endpoint. Most systems need per-user limits with a global safety net.

Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle before hitting 429s.

Store rate limit counters in a fast in-memory store (Redis) rather than the application database. Latency on the rate limit check must be sub-millisecond.

Implement a separate, higher rate limit for authenticated users vs anonymous traffic to reward sign-ups without penalizing legitimate usage.

Rate Limiting Is a Product Decision, Not Just a Technical One

The right rate limit depends on your product, not just your infrastructure capacity. A social media API and a payment API need fundamentally different limits, even if they run on the same hardware.

Engage product and business teams when setting limits. Too aggressive and you frustrate power users; too lenient and a single bad actor can degrade the experience for everyone.

Graceful Degradation Over Hard Rejection

Instead of immediately returning 429, consider degrading the response: serve cached data, reduce response detail, or queue the request for delayed processing.

Tradeoff Table

Decision	Speed-First Option	Reliability-First Option	Recommended When
Token Bucket vs Sliding Window	Token bucket is simple, allows burst traffic up to bucket size	Sliding window provides smoother enforcement and prevents burst spikes at window boundaries	Use token bucket for APIs where occasional bursts are acceptable; sliding window for strict per-second enforcement
Fixed Window vs Sliding Window Log	Fixed window uses a single counter per window, O(1) memory	Sliding window log tracks individual request timestamps, exact enforcement	Use fixed window for high-throughput, low-precision needs; sliding window log when accuracy matters more than memory
Client-side vs Server-side Limiting	Client-side limiting reduces unnecessary network calls	Server-side is authoritative and cannot be bypassed by malicious clients	Always enforce server-side. Add client-side as a courtesy to reduce 429 responses for well-behaved clients

Practice Next

Rate Limiting Topic Hub

Algorithms, implementation patterns, and production considerations for rate limiting.

API Gateway and Auth Lab

Practice configuring rate limits alongside authentication in the API gateway lab.

Challenges

Flash Sale Challenge
Design a flash sale system where rate limiting prevents inventory overselling under burst traffic.
Payment Gateway 1
Build a payment gateway where rate limiting protects against fraudulent transaction floods.

Frequently Asked Questions

Should I rate limit internal service-to-service calls?

Yes, but with higher limits. Internal rate limiting prevents cascading failures where one runaway service overwhelms another. Use circuit breakers alongside rate limits for defense in depth.

How do I handle rate limiting in a distributed system with multiple servers?

Use a centralized counter store (Redis) that all servers check. For higher availability, use local in-memory counters with periodic sync, accepting slight over-admission at window boundaries.

What HTTP status code should I return when rate limited?

Return 429 Too Many Requests with a Retry-After header indicating when the client can retry. Never return 403 or 503 for rate limiting; those codes have different semantics.