HardEnterprise

Design TikTok

CDNMedia ProcessingDatabasesCachingAnalyticsMessage Queues

Problem Statement

Design the architecture for TikTok - the fastest-growing content platform with 1.5 billion monthly active users watching an average of 95 minutes per day. Your design must cover:

- Video creation & upload - users record or upload short videos (15 seconds to 10 minutes) with effects, filters, music overlays, and text. Videos are encoded to multiple resolutions (360p to 1080p) and formats.For You feed (FYF) - the most critical feature. Unlike follower-based feeds (Twitter, Instagram), TikTok's feed is entirely algorithm-driven. An ML recommendation system selects videos based on user engagement signals (watch time, likes, shares, scroll-away speed), creator features, and video content features. The feed is an infinite scroll of personalized content.Content moderation - AI-powered moderation must review every video before it reaches the FYF. Detect nudity, violence, hate speech, misinformation, and copyright violations. Moderation must complete within 5 minutes of upload for fast time-to-virality.Viral distribution - a video can go from 0 to 10 million views in hours. The CDN must handle explosive, unpredictable traffic patterns. Newly popular videos must be pushed to edge caches proactively.Music/sound library - searchable catalog of millions of songs and original sounds. Link videos to their source audio for "use this sound" feature.Engagement - likes, comments, shares, duets (side-by-side reaction videos), and stitches (appending to another video).Creator analytics - real-time view counts, follower growth, demographic breakdowns for content creators.

The core challenge is the recommendation-first architecture where the feed quality is the product.

What You'll Learn

Design TikTok's short-video platform - video creation, For You feed, content moderation, and viral distribution for 1 B+ users. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

CDNMedia ProcessingDatabasesCachingAnalyticsMessage Queues

Constraints

Monthly active users1,500,000,000
Avg session time95 minutes/day
Video uploads/day~30,000,000
Videos served/day (views)~50,000,000,000
Moderation latency< 5 minutes per video
Feed load time< 400 ms
Recommendation model inference< 100 ms
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design TikTok's short-video platform - video creation, For You feed, content moderation, and viral distribution for 1 B+ users.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Monthly active users: 1,500,000,000
  • Avg session time: 95 minutes/day
  • Video uploads/day: ~30,000,000
  • Videos served/day (views): ~50,000,000,000
  • Moderation latency: < 5 minutes per video

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Media Processing: Split ingest, transform, and delivery into independent stages with async orchestration.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • Analytics: Maintain separate OLTP and analytics paths; stream events into a warehouse/time-series layer.
  • Message Queues: Move non-blocking and retry-heavy work to async consumers with explicit retry and DLQ policies.

4) Reliability and Failure Strategy

  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Store original media durably and make transforms replayable.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Version event schemas and monitor drop/late-event rates.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Media Processing: Pre-processing improves playback UX, but requires substantial compute/storage budget.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • Analytics: Analytics pipeline unlocks insights, but adds eventual consistency and governance overhead.

Practical Notes

  • Two-stage recommendation: candidate retrieval (from millions → thousands using approximate nearest neighbor) → ranking (ML model scores top few hundred → return top 30).
  • Feedback loop: log every user interaction (video_id, watch_duration, like, share, skip) into a streaming pipeline for near-real-time model feature updates.
  • CDN pre-warming: when a video's view velocity exceeds a threshold, proactively push it to all edge locations before it goes fully viral.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Mobile Clients -> DNS -> CDN Edge -> Load Balancer -> Core Service -> Primary NoSQL DB -> Redis Cache -> Event Bus

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Async queue/event bus isolates bursty workloads and supports retries without blocking synchronous requests.
  • Media processing is handled by background workers so user-facing latency stays low.
  • Analytics pipeline is separated from OLTP path to avoid reporting workloads impacting transactions.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.