HardEnterprise

Design Twitter

DatabasesCachingShardingMessage QueuesSearchCDN

Problem Statement

Design the full backend architecture for Twitter (X) - one of the most iconic system design problems. The platform serves 500 million monthly active users posting 500 million tweets per day. Your design must cover:

- Tweet creation & storage - users post tweets (up to 280 characters, optionally with images/videos). Tweets are immutable once posted (edits create a new version linked to the original).Home timeline generation - the most critical feature. When a user opens the app, they see tweets from people they follow, ranked by relevance. The fan-out problem is central - a celebrity with 50 M followers posting a tweet must not cause a write storm to 50 M timelines.Search - real-time full-text search over all tweets, with filtering by recency, popularity, and user. Results in < 300 ms.Trends - detect trending topics/hashtags globally and per-region using sliding-window analytics, updated every minute.Notifications - likes, retweets, mentions, and new followers trigger push notifications.Media delivery - images and videos are served via CDN. Videos are transcoded into multiple resolutions.

Think about the hybrid fan-out model: fan-out-on-write for normal users (~1,000 followers) and fan-out-on-read for celebrities (~50 M followers).

What You'll Learn

Design the architecture behind Twitter - tweets, timelines, trends, and search at 500 M+ users. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesCachingShardingMessage QueuesSearchCDN

Constraints

Monthly active users500,000,000
Daily tweets~500,000,000
Max followers per user50,000,000+
Timeline load time< 300 ms
Search latency< 300 ms
Trending refresh rateEvery 60 seconds
Media storage growth~10 TB/day
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design the architecture behind Twitter - tweets, timelines, trends, and search at 500 M+ users.
  • Design for a peak load target around 34,722 RPS (including burst headroom).
  • Monthly active users: 500,000,000
  • Daily tweets: ~500,000,000
  • Max followers per user: 50,000,000+
  • Timeline load time: < 300 ms
  • Search latency: < 300 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Sharding: Choose shard keys around access patterns and growth hotspots, not just data size.
  • Message Queues: Move non-blocking and retry-heavy work to async consumers with explicit retry and DLQ policies.
  • Search: Use primary store for writes and async index updates for search relevance + scale.
  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Support rebalancing and hotspot detection from day one.
  • Guarantee idempotent consumers and trace every message with correlation IDs.
  • Track indexing lag and support reindex from source of truth.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Sharding: Sharding improves horizontal scale but makes cross-shard queries and transactions harder.
  • Message Queues: Async pipelines absorb spikes well, but increase eventual-consistency complexity.
  • Search: Search index gives rich querying but introduces eventual consistency and index ops overhead.

Practical Notes

  • Hybrid fan-out: pre-compute timelines for users with < 10 k followers (write path), fetch on-the-fly for celebrities (read path).
  • Tweet storage: shard by user_id; timeline cache in Redis (sorted set by timestamp, trim to last 800 tweets).
  • Elasticsearch cluster for full-text tweet search, partitioned by time (recent tweets in hot shards).

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> CDN Edge -> Load Balancer -> Core Service -> Primary NoSQL DB -> Redis Cache -> Message Queue

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Async queue/event bus isolates bursty workloads and supports retries without blocking synchronous requests.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.