MediumIntermediate

API Gateway & Rate Limiter

API DesignRate LimitingCachingAuthMonitoring

Problem Statement

GateKeep is building a centralized API gateway for a platform that runs 50 microservices. The gateway is the single entry point for all client traffic. It must handle:

- Request routing - route incoming requests to the correct microservice based on URL path, HTTP method, and headers. Support versioned APIs (/v1/, /v2/).Authentication - validate JWT tokens, API keys, or OAuth2 tokens on every request. Reject unauthorized requests before they reach backend services.Rate limiting - enforce per-client rate limits with multiple strategies: fixed window, sliding window, and token bucket. Limits differ per plan (free: 100/min, pro: 1,000/min, enterprise: 10,000/min).Request/response transformation - add headers, remove sensitive fields from responses, transform XML to JSON, gzip compression.Circuit breaker - if a backend service is failing (> 50% error rate), stop forwarding traffic and return a cached fallback response.Observability - log every request (method, path, status, latency), export metrics (Prometheus), and support distributed tracing (trace ID propagation).

Handle 100,000 requests per second at peak with < 10 ms gateway overhead.

What You'll Learn

Design an API gateway that handles routing, authentication, rate limiting, and request transformation for 50 microservices. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

API DesignRate LimitingCachingAuthMonitoring

Constraints

Backend microservices~50
Peak requests/second~100,000
Gateway latency overhead< 10 ms
Active API keys~50,000
Rate limit accuracy> 99%
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design an API gateway that handles routing, authentication, rate limiting, and request transformation for 50 microservices.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Backend microservices: ~50
  • Peak requests/second: ~100,000
  • Gateway latency overhead: < 10 ms
  • Active API keys: ~50,000
  • Rate limit accuracy: > 99%

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • API Design: Standardize API boundaries, idempotency keys, pagination, and error contracts first.
  • Rate Limiting: Enforce token/sliding-window limits at ingress and for sensitive internal APIs.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Auth: Centralize identity verification and keep authorization checks close to domain resources.
  • Monitoring: Instrument golden signals (latency, traffic, errors, saturation) per tier and per tenant/domain.

4) Reliability and Failure Strategy

  • Apply strict input validation and backward-compatible versioning.
  • Return deterministic 429 behavior with clear retry headers.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Use short-lived tokens and secure key rotation workflows.
  • Alert on user-impact SLOs, not only infrastructure metrics.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • API Design: Rich APIs improve developer speed but can create long-term compatibility burden.
  • Rate Limiting: Aggressive limits protect the system but can hurt legitimate burst traffic.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Auth: Central auth simplifies policy, but makes auth service availability/security critical.
  • Monitoring: Deep observability speeds incident response but raises ingestion and tooling costs.

Practical Notes

  • Rate limiting: use Redis with a Lua script for atomic increment-and-check. Sliding window log or token bucket gives smoother rate limiting than fixed windows.
  • The gateway itself must be stateless and horizontally scalable - run multiple instances behind a L4 load balancer.
  • Circuit breaker: track error rates per backend in-memory. After tripping, periodically allow a single request through to check recovery (half-open state).

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> Load Balancer -> API Gateway -> Rate Limiter -> API Service -> Auth Service -> Redis Cache -> Monitoring

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Monitoring and logs are wired in from day one for rapid incident triage.
  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.