MediumFlash Sale · Part 1

Flash Sale - Inventory Under Pressure

DatabasesCachingRate LimitingMessage QueuesConsistency

Problem Statement

DealDash is an e-commerce platform running weekly flash sales. At exactly 8 PM every Thursday, 10,000 limited-edition items become available and typically sell out in under 60 seconds.

The challenge: 1 million users hit the "Buy Now" button within the first 10 seconds. You must ensure:

- No overselling - exactly 10,000 items are sold, never more.Fairness - first-come-first-served with a transparent queue.Resilience - the rest of the website (browsing, search, account) must remain functional even during the sale stampede.Feedback - users get immediate confirmation ("secured!" or "sold out") within 3 seconds.

This is fundamentally a concurrency + consistency problem at extreme scale.

What You'll Learn

Design a flash sale system where 1 M users compete for 10,000 items in 60 seconds. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesCachingRate LimitingMessage QueuesConsistency

Constraints

Concurrent users at drop~1,000,000
Inventory10,000 items
Purchase window~60 seconds
Confirmation latency< 3 seconds
Overselling toleranceZero
Availability (non-sale pages)99.9%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design a flash sale system where 1 M users compete for 10,000 items in 60 seconds.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Concurrent users at drop: ~1,000,000
  • Inventory: 10,000 items
  • Purchase window: ~60 seconds
  • Confirmation latency: < 3 seconds
  • Overselling tolerance: Zero

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Rate Limiting: Enforce token/sliding-window limits at ingress and for sensitive internal APIs.
  • Message Queues: Move non-blocking and retry-heavy work to async consumers with explicit retry and DLQ policies.
  • Consistency: Classify operations by consistency requirement: strong for money/inventory, eventual for feeds/analytics.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Return deterministic 429 behavior with clear retry headers.
  • Guarantee idempotent consumers and trace every message with correlation IDs.
  • Use idempotency keys and conflict-resolution rules on retried/distributed writes.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Rate Limiting: Aggressive limits protect the system but can hurt legitimate burst traffic.
  • Message Queues: Async pipelines absorb spikes well, but increase eventual-consistency complexity.
  • Consistency: Stronger consistency improves correctness, but often increases latency and coordination costs.

Practical Notes

  • Pre-decrement inventory in Redis (atomic DECR) to avoid database contention, then async-persist to DB.
  • A virtual queue with rate limiting can spread the thundering herd over a few seconds.
  • Isolate the flash-sale service from the main e-commerce platform (bulkhead pattern).

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> Load Balancer -> API Gateway -> Rate Limiter -> API Service -> Primary SQL DB -> Read Model DB -> Redis Cache

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Async queue/event bus isolates bursty workloads and supports retries without blocking synchronous requests.
  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.