HardEnterprise

Design Instagram

DatabasesCDNMedia ProcessingCachingShardingMessage Queues

Problem Statement

Design the backend for Instagram - a media-first social platform with 2 billion monthly active users. It handles one of the largest media pipelines in the world. Your design must cover:

- Photo & video upload pipeline - users upload photos and short-form videos (Reels, up to 90 seconds). Each upload is resized into multiple resolutions, stripped of EXIF data, and stored durably. The system handles 100 million uploads per day.Feed generation - the home feed mixes posts from followed accounts and recommended content, ranked by an ML model (engagement prediction). Feed must load in < 500 ms.Stories - ephemeral 24-hour content served from edge caches. Stories from close friends are prioritized.Explore / Reels - entirely recommendation-driven feeds. A candidate generation → ranking → re-ranking pipeline selects content, personalized per user.Direct Messages (DMs) - E2E-encrypted messaging with photo/video sharing support.Engagement features - likes, comments, saves, shares. Like counts are eventually consistent (it's okay if the count is off by a few for a few seconds).

The key challenge is the sheer volume of media processing + storage combined with real-time feed personalization.

What You'll Learn

Design Instagram's photo/video sharing platform - feed generation, Stories, Reels, and Explore at 2 B users. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesCDNMedia ProcessingCachingShardingMessage Queues

Constraints

Monthly active users2,000,000,000
Media uploads/day~100,000,000
Total media storedExabytes
Feed load time< 500 ms
Story availability24 hours
Image resolutions generated6+ per upload
CDN edge locations100+
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design Instagram's photo/video sharing platform - feed generation, Stories, Reels, and Explore at 2 B users.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Monthly active users: 2,000,000,000
  • Media uploads/day: ~100,000,000
  • Total media stored: Exabytes
  • Feed load time: < 500 ms
  • Story availability: 24 hours

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Media Processing: Split ingest, transform, and delivery into independent stages with async orchestration.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Sharding: Choose shard keys around access patterns and growth hotspots, not just data size.
  • Message Queues: Move non-blocking and retry-heavy work to async consumers with explicit retry and DLQ policies.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Store original media durably and make transforms replayable.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Support rebalancing and hotspot detection from day one.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Media Processing: Pre-processing improves playback UX, but requires substantial compute/storage budget.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Sharding: Sharding improves horizontal scale but makes cross-shard queries and transactions harder.

Practical Notes

  • Separate the media storage layer (object store + CDN) from the metadata layer (Cassandra/PostgreSQL for posts, likes, follows).
  • Feed generation: pre-compute candidate posts → ML ranking service → cache top-500 posts per user in Redis.
  • Use a DAG-based media processing pipeline: upload → virus scan → resize → watermark → CDN push.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> CDN Edge -> Load Balancer -> Core Service -> Primary NoSQL DB -> Redis Cache -> Message Queue

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Async queue/event bus isolates bursty workloads and supports retries without blocking synchronous requests.
  • Media processing is handled by background workers so user-facing latency stays low.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.