MediumSocial Feed · Part 2

Social Feed 2 - Going Viral

CachingMessage QueuesDatabasesSearchNotifications

This challenge builds on Social Feed 1 - MVP Launch. Complete it first for the best experience.

Problem Statement

Chirper has exploded in popularity. The platform now has 10 million registered users, with 3 million daily active. New requirements:

- Celebrity problem - some users have 5 million+ followers. When they post, the fan-out creates a massive write amplification. The timeline service is buckling.Search - users want to search for chirps by keyword, hashtag, or username. Search must return results within 500 ms.Push notifications - users get notified when someone they follow posts, when their chirp is liked, or when they're mentioned.Trending topics - a "Trending" section shows the most-discussed hashtags in the last hour.

Redesign the system to handle high-fanout scenarios, add search, and build a notification pipeline.

What You'll Learn

Handle 10 M users, celebrity fan-out, full-text search, and push notifications. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

CachingMessage QueuesDatabasesSearchNotifications

Constraints

Registered users10,000,000
Daily active users3,000,000
Max followers per user5,000,000
Posts per day~10,000,000
Search latency< 500 ms
Notification delivery< 10 seconds
Availability target99.9%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Handle 10 M users, celebrity fan-out, full-text search, and push notifications.
  • Design for a peak load target around 694 RPS (including burst headroom).
  • Registered users: 10,000,000
  • Daily active users: 3,000,000
  • Max followers per user: 5,000,000
  • Posts per day: ~10,000,000
  • Search latency: < 500 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Message Queues: Move non-blocking and retry-heavy work to async consumers with explicit retry and DLQ policies.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Search: Use primary store for writes and async index updates for search relevance + scale.
  • Notifications: Model notifications as event-driven fanout with per-channel workers (email/push/webhook).

4) Reliability and Failure Strategy

  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Guarantee idempotent consumers and trace every message with correlation IDs.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Track indexing lag and support reindex from source of truth.
  • Track delivery state machine and dead-letter undeliverable events.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Message Queues: Async pipelines absorb spikes well, but increase eventual-consistency complexity.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Search: Search index gives rich querying but introduces eventual consistency and index ops overhead.
  • Notifications: Multi-channel coverage increases reach but adds per-channel failure modes and policy complexity.

Practical Notes

  • Hybrid fan-out: fan-out-on-write for normal users, fan-out-on-read for celebrities.
  • Elasticsearch or a similar full-text search engine can power keyword/hashtag search.
  • Use a message queue to decouple notification generation from the posting flow.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> Load Balancer -> API Service -> Primary NoSQL DB -> Redis Cache -> Message Queue -> Background Workers -> Notification Fanout

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Async queue/event bus isolates bursty workloads and supports retries without blocking synchronous requests.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.