HardEnterprise

Design YouTube

CDNMedia ProcessingDatabasesCachingStorageMicroservicesSearch

Problem Statement

Design the architecture for YouTube - the world's largest video platform with 2.5 billion monthly users watching 1 billion hours of video per day. Your design must cover:

- Video upload & transcoding - creators upload videos up to 12 hours long and 256 GB in size. Each video is transcoded into 20+ resolution/bitrate/codec combinations (360p to 8K, H.264/VP9/AV1). Transcoding must complete within 2 hours of upload.Adaptive bitrate streaming - the player uses DASH/HLS to switch quality in real time based on the viewer's bandwidth. Segment duration: 2-6 seconds.Content delivery - videos are served from a global CDN with thousands of edge locations. Popular videos are pre-cached at the edge; long-tail videos are served from origin.Recommendation engine - the home feed and "Up Next" sidebar are powered by a deep learning model trained on watch history, engagement, and content features. Low-latency inference (< 200 ms) is critical.Search - search across billions of videos by title, description, transcript (auto-generated via speech-to-text), and metadata.Comments & engagement - threaded comments, likes/dislikes, subscriptions. Handle viral videos with millions of concurrent viewers.Monetization - ad insertion (pre-roll, mid-roll) with real-time auction (ad bidding) for every video view.

This is one of the hardest system design problems due to the combination of massive storage, heavy compute, and real-time streaming at global scale.

What You'll Learn

Design YouTube's video platform - upload, transcode, adaptive streaming, recommendations, and comments at planetary scale. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

CDNMedia ProcessingDatabasesCachingStorageMicroservicesSearch

Constraints

Monthly active users2,500,000,000
Hours of video watched/day1,000,000,000
Video uploads/minute~500 hours
Transcoding variants per video20+
Total video storageExabytes
Recommendation latency< 200 ms
CDN edge locationsThousands
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design YouTube's video platform - upload, transcode, adaptive streaming, recommendations, and comments at planetary scale.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Monthly active users: 2,500,000,000
  • Hours of video watched/day: 1,000,000,000
  • Video uploads/minute: ~500 hours
  • Transcoding variants per video: 20+
  • Total video storage: Exabytes

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Media Processing: Split ingest, transform, and delivery into independent stages with async orchestration.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Storage: Use object storage for large blobs and keep metadata/authorization separate in the API tier.
  • Microservices: Split services by business boundary, not by technical layer, and enforce ownership per domain.
  • Search: Use primary store for writes and async index updates for search relevance + scale.

4) Reliability and Failure Strategy

  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Store original media durably and make transforms replayable.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Enforce lifecycle policies, retention tiers, and checksum validation.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Media Processing: Pre-processing improves playback UX, but requires substantial compute/storage budget.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Storage: Object storage is cheap and durable, but random low-latency reads are weaker than databases/caches.

Practical Notes

  • Split video into segments and transcode in parallel across a GPU worker fleet - each segment is independent.
  • Hot/cold storage tiering: top 10% of videos (by views) on SSD at edge; long-tail on HDD/tape at origin.
  • Recommendation: two-stage pipeline - candidate generation (fast, broad) → ranking (ML model, precise). Pre-compute home feed offline, refresh on-demand.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> CDN Edge -> Load Balancer -> API Gateway -> Core Service -> Primary SQL DB -> Redis Cache

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Media processing is handled by background workers so user-facing latency stays low.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.