Case Study: Design a Video Streaming Platform

Video streaming platforms like YouTube and Netflix serve billions of hours of video daily. This is one of the most infrastructure-intensive systems to design: it involves massive storage, compute-heavy transcoding pipelines, global CDN distribution, adaptive bitrate streaming, and recommendation engines. This case study walks through the end-to-end architecture from upload to playback.

Step 1: Clarify Requirements

Functional Requirements

  • Upload: Creators upload videos of varying length (seconds to hours) and resolution (up to 4K).
  • Transcode: Convert uploaded videos into multiple resolutions and formats for adaptive streaming.
  • Stream: Viewers watch videos with smooth playback, automatic quality adjustment based on bandwidth.
  • Search & Discovery: Users search for videos by title, tags, and description. Trending and recommended videos on the homepage.
  • Engagement: Like, comment, subscribe, share. View counts and watch history.
  • Recommendations: Personalized video suggestions based on watch history and preferences.

Non-Functional Requirements

  • Scale: Support 1 billion daily active users, 500 million videos in the catalog.
  • Availability: 99.99% uptime for playback. Upload can tolerate slightly lower availability.
  • Low latency: Video playback must start within 2 seconds. Seek operations under 1 second.
  • Durability: Uploaded videos must never be lost. 11 nines of durability for raw and transcoded assets.
  • Global reach: Low-latency playback across all continents via CDN.
  • Cost efficiency: Storage and bandwidth are the dominant costs. Optimize aggressively.

Step 2: Back-of-Envelope Estimates

EstimationUsers: 1 billion DAU Videos in catalog: 500 million UPLOAD: New uploads/day: 500,000 videos Average raw size: 500 MB (mix of short and long content) Daily raw upload: 500K * 500 MB = 250 TB/day After transcoding: ~3x raw size (multiple resolutions) 250 TB * 3 = 750 TB/day of transcoded output Annual storage: 750 TB * 365 = ~274 PB/year STREAMING: Average watch time: 30 minutes/user/day Average bitrate: 5 Mbps (mix of resolutions) Concurrent viewers: ~50 million (peak) Peak bandwidth: 50M * 5 Mbps = 250 Tbps (CDN handles most of this; origin serves ~1-5%) TRANSCODING: 500K videos/day, average 10 min each Transcoding to 5 resolutions = 2.5M transcoding jobs/day At ~2x real-time per resolution: Total compute: 500K * 10 min * 5 = 25M GPU-minutes/day METADATA: 500M videos * 2 KB metadata = 1 TB Comments: ~10 billion total, ~5 TB Watch history: 1B users * 200 entries * 50B = 10 TB

Step 3: High-Level Design

Creator (Upload) Viewer (Watch) Upload Service Chunked upload Raw Storage (S3 / GCS) Transcode Queue (Kafka / SQS) Transcoding Pipeline DAG: split-encode-merge Video Storage HLS/DASH segments Metadata DB (PostgreSQL) Thumbnail Gen + Virus Scan CDN Edge Servers (CloudFront/Akamai) Streaming API Manifest + auth API Gateway Auth + routing Recommendation Engine Collaborative filter Search Service (Elasticsearch) metadata origin pull

The architecture splits into two distinct paths: the upload/transcode path (top) handles ingestion, processing, and storage, while the playback path (bottom) serves video content through CDN edge servers. These paths are decoupled: a video becomes available for streaming only after the transcoding pipeline completes and metadata is written.

Step 4: Deep Dive

Upload Pipeline

Uploading large video files over unreliable networks requires a robust, resumable upload protocol. The upload service implements chunked, resumable uploads.

1
Initiate upload: Client sends metadata (title, description, tags). Server creates an upload session and returns a unique upload URL with a pre-signed S3 multipart upload ID.
2
Chunked upload: Client splits the video into 5-10 MB chunks and uploads them in parallel (up to 6 concurrent connections). Each chunk includes an MD5 checksum for integrity verification.
3
Resume on failure: If the connection drops, the client queries the server for which chunks were received and resumes from the last incomplete chunk. No data is re-uploaded.
4
Complete upload: Once all chunks arrive, the server triggers S3 multipart completion to assemble the final object. The raw video is now stored durably.
5
Post-upload processing: A message is published to the transcode queue. In parallel, the virus scanner checks the file and metadata extraction reads video properties (duration, codec, resolution, framerate).
Upload API// 1. Initiate upload POST /v1/videos/upload { "title": "System Design in 10 Minutes", "description": "Quick overview of system design...", "tags": ["system-design", "tutorial"], "file_size": 524288000, // 500 MB "content_type": "video/mp4" } Response: { "upload_id": "upl_abc123", "upload_url": "https://s3.amazonaws.com/raw-videos/...", "chunk_size": 5242880, // 5 MB recommended "total_chunks": 100 } // 2. Upload each chunk PUT /v1/videos/upload/{upload_id}/chunks/{chunk_number} Headers: Content-MD5: {checksum} Body: [binary chunk data] // 3. Complete upload POST /v1/videos/upload/{upload_id}/complete Response: { "video_id": "vid_xyz789", "status": "processing", "estimated_ready": "2026-02-18T10:30:00Z" }

Transcoding Pipeline

Transcoding converts a single uploaded video into multiple renditions (resolutions and bitrates) for adaptive streaming. This is the most compute-intensive part of the system.

The pipeline is modeled as a Directed Acyclic Graph (DAG) of tasks:

Transcoding DAG β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Raw Video β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Split β”‚ Split into 10-sec segments β”‚ into GOP β”‚ (Group of Pictures) β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Encode 1080pβ”‚ β”‚ Encode 720p β”‚ β”‚ Encode 480p β”‚ ... + 360p, 240p β”‚ H.264/H.265β”‚ β”‚ H.264 β”‚ β”‚ H.264 β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Package β”‚ Generate HLS (.m3u8 + .ts) β”‚ HLS/DASH β”‚ and DASH (.mpd + .m4s) β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Thumbnails β”‚ β”‚ DRM β”‚ β”‚ Upload to β”‚ β”‚ Generation β”‚ β”‚Encryptβ”‚ β”‚ CDN Origin β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

HLS (HTTP Live Streaming)

  • Apple's protocol. Dominant on iOS, Safari, and most players.
  • Uses .m3u8 playlist files and .ts (MPEG-TS) segments.
  • Segments are typically 2-10 seconds long.
  • Master playlist references multiple quality levels.
  • Widely supported: works on nearly every device.

DASH (Dynamic Adaptive Streaming)

  • International standard (ISO 23009). Codec-agnostic.
  • Uses .mpd manifest and .m4s (fMP4) segments.
  • Supports more flexible segment durations.
  • Better DRM integration (Widevine, PlayReady).
  • Preferred for Android and smart TV platforms.
Why Split Before Encoding?

Splitting the video into short segments (GOP-aligned) before encoding enables massive parallelism. Instead of encoding a 2-hour video sequentially on one machine (which would take ~4 hours), you split it into 720 ten-second segments and encode each in parallel across hundreds of workers. This reduces total transcoding time from hours to minutes.

Video Storage Architecture

Video storage is the largest cost center. A well-designed storage strategy uses tiered storage and intelligent lifecycle policies.

Storage TierContentAccess PatternCost (relative)
Hot (S3 Standard)Videos uploaded in the last 30 days, popular videosFrequent reads from CDN origin$$$
Warm (S3 IA)Videos 30-180 days old, moderate viewsOccasional CDN origin pulls$$
Cold (S3 Glacier)Videos older than 180 days, rarely viewedRare access, minutes to retrieve$
Archive (Glacier Deep)Raw uploads (kept for re-transcoding)Almost never accessed$0.10
Storage Layouts3://video-platform-raw/ └── {video_id}/ └── original.mp4 # Raw upload (archive after processing) s3://video-platform-transcoded/ └── {video_id}/ β”œβ”€β”€ master.m3u8 # HLS master playlist β”œβ”€β”€ 1080p/ β”‚ β”œβ”€β”€ playlist.m3u8 # 1080p variant playlist β”‚ β”œβ”€β”€ segment_000.ts # 2-second segments β”‚ β”œβ”€β”€ segment_001.ts β”‚ └── ... β”œβ”€β”€ 720p/ β”‚ β”œβ”€β”€ playlist.m3u8 β”‚ └── ... β”œβ”€β”€ 480p/ β”‚ └── ... β”œβ”€β”€ 360p/ β”‚ └── ... β”œβ”€β”€ thumbnails/ β”‚ β”œβ”€β”€ poster.jpg # Main thumbnail β”‚ β”œβ”€β”€ sprite.jpg # Thumbnail sprite for scrubbing β”‚ └── preview.webm # 5-second hover preview └── subtitles/ β”œβ”€β”€ en.vtt └── es.vtt

Metadata Database

SchemaTABLE videos ( id UUID PRIMARY KEY, creator_id BIGINT NOT NULL REFERENCES users(id), title VARCHAR(200) NOT NULL, description TEXT, duration_sec INT, status ENUM('uploading','processing','ready','failed','removed') DEFAULT 'uploading', visibility ENUM('public','unlisted','private') DEFAULT 'public', storage_path VARCHAR(500), -- S3 prefix for transcoded files original_path VARCHAR(500), -- S3 path for raw upload view_count BIGINT DEFAULT 0, like_count BIGINT DEFAULT 0, created_at TIMESTAMP DEFAULT NOW(), published_at TIMESTAMP ); INDEX idx_creator ON videos(creator_id, created_at DESC); INDEX idx_status ON videos(status) WHERE status = 'processing'; INDEX idx_trending ON videos(view_count DESC, published_at DESC); TABLE video_renditions ( video_id UUID NOT NULL REFERENCES videos(id), resolution VARCHAR(10) NOT NULL, -- '1080p', '720p', '480p' bitrate_kbps INT NOT NULL, codec VARCHAR(20) NOT NULL, -- 'h264', 'h265', 'vp9', 'av1' format ENUM('hls','dash') NOT NULL, segment_count INT, total_size_mb INT, playlist_path VARCHAR(500), PRIMARY KEY (video_id, resolution, format) ); TABLE watch_history ( user_id BIGINT NOT NULL, video_id UUID NOT NULL, watched_at TIMESTAMP DEFAULT NOW(), watch_duration INT, -- seconds watched last_position INT, -- resume position in seconds PRIMARY KEY (user_id, video_id) );

Streaming: Adaptive Bitrate (ABR)

Adaptive bitrate streaming is the key to smooth playback across varying network conditions. The player dynamically switches between quality levels based on available bandwidth.

1
Request manifest: The player fetches the master playlist (master.m3u8) from the CDN. This file lists all available quality levels with their bitrates.
2
Bandwidth estimation: The player downloads the first segment and measures download speed. Based on this, it selects the highest quality level that can be sustained.
3
Segment-by-segment switching: For each subsequent segment, the player re-evaluates bandwidth. If the network degrades, it switches down to a lower bitrate mid-stream. If bandwidth improves, it switches up.
4
Buffer management: The player maintains a buffer of 10-30 seconds of video. If the buffer drops below a threshold, it aggressively switches to a lower quality to prevent stalling.
HLS Master Playlist#EXTM3U #EXT-X-VERSION:6 #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2" 1080p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720,CODECS="avc1.4d401f,mp4a.40.2" 720p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480,CODECS="avc1.4d401e,mp4a.40.2" 480p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360,CODECS="avc1.42e01e,mp4a.40.2" 360p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240,CODECS="avc1.42e00a,mp4a.40.2" 240p/playlist.m3u8
CDN Edge Caching Strategy

Not all video segments are equally popular. The first few segments of a video are requested most often (many users click a video and leave within seconds). CDNs should prioritize caching early segments. For long-tail content that is rarely watched, the CDN will issue an origin pull on the first request, then cache locally for subsequent viewers. Popular videos are pre-warmed to edge locations before they trend.

Content Recommendation

The recommendation engine drives engagement by surfacing relevant videos. At scale, recommendations account for over 70% of all video views on platforms like YouTube.

Collaborative Filtering

  • "Users who watched X also watched Y."
  • Build a user-item interaction matrix from watch history.
  • Use matrix factorization (ALS) or neural collaborative filtering to find latent factors.
  • Good for discovering content outside a user's usual interests.
  • Cold start problem: cannot recommend for new users or new videos with no watch data.

Content-Based Filtering

  • "This video is similar to others you have watched."
  • Extract features from video metadata: title, tags, description, category, creator.
  • Compute similarity scores using TF-IDF or embedding vectors.
  • Works well for new users (uses explicit preferences) and new content.
  • Tends to create "filter bubbles": recommending only similar content.

In practice, production systems use a two-stage approach:

  1. Candidate generation: A lightweight model retrieves hundreds of candidate videos from a pool of millions (using approximate nearest neighbors on embedding vectors).
  2. Ranking: A heavier model (deep neural network) scores each candidate based on features like watch history, time of day, device, video freshness, creator affinity, and predicted watch time. The top results are served.

Cost Optimization

Storage and bandwidth dominate costs at scale. Key strategies to control expenses:

StrategySavingsTrade-off
Storage tiering (hot/warm/cold)60-80% on storageCold videos have seconds of latency on first access
Codec optimization (H.265/AV1)30-50% bitrate reduction at same qualityHigher transcoding cost, older devices may not support
Lazy transcodingSave compute on never-watched videosFirst viewer of a rare video experiences delay
CDN caching90%+ reduction in origin bandwidthCache invalidation complexity
Delete low-value renditions20-40% storage reductionIf requested, must re-transcode from raw
Per-title encoding20-30% bitrate reductionRequires per-video encoding analysis (extra compute)
Lazy Transcoding

Instead of transcoding every uploaded video into all resolutions immediately, transcode only the most common resolutions (720p, 480p) upfront. Higher resolutions (1080p, 4K) are transcoded on-demand when a viewer requests them, then cached. This dramatically reduces compute costs since many videos are never watched in high resolution, and some are never watched at all.

Step 5: Scaling & Optimizations

  • Upload scaling: Use pre-signed URLs to upload directly to S3, bypassing your servers entirely. The upload service only handles metadata and orchestration, not data transfer.
  • Transcoding scaling: Use spot/preemptible GPU instances for transcoding (70-90% cost savings). Jobs are idempotent and restartable, so preemption is safe. Auto-scale worker pools based on queue depth.
  • CDN multi-layer caching: Use a two-tier CDN: edge PoPs (200+ locations) for hot content, and regional mid-tier caches to reduce origin load for warm content. Cache hit ratios should exceed 95%.
  • Database scaling: Separate the read-heavy metadata queries (video info, search) from write-heavy analytics (view counts, watch history). Use read replicas for metadata. Use Redis for real-time view count aggregation, flushing to the database periodically.
  • Search: Index video metadata in Elasticsearch for full-text search. Use separate indices for titles, tags, and descriptions with boosted relevance scoring. Auto-complete and typo correction via n-gram tokenizers.
  • Live streaming extension: For live content, replace the transcode pipeline with real-time encoders (OBS -> RTMP ingest -> live transcoder -> HLS/DASH segments pushed to CDN in near-real-time). Latency target: 3-10 seconds.
  • View count accuracy: At billions of views per day, real-time counting is expensive. Use a write-back cache: increment in Redis, flush to PostgreSQL every 30 seconds. Accept slight inconsistency in displayed counts.

Architecture Summary

ComponentTechnologyPurpose
Upload ServiceAPI + S3 multipartChunked, resumable video upload
Raw StorageS3 (Glacier archive)Durable storage of original files
Transcode QueueKafka / SQSDecouple upload from processing
Transcoding PipelineFFmpeg on GPU workersDAG: split, encode, package HLS/DASH
Video StorageS3 (tiered)Transcoded segments, thumbnails, subtitles
Metadata DBPostgreSQL + read replicasVideo info, renditions, watch history
CDNCloudFront / AkamaiEdge caching, global low-latency delivery
Streaming APIREST APIAuth, manifest URLs, playback tokens
RecommendationML pipeline (ALS + DNN)Personalized video suggestions
SearchElasticsearchFull-text video search with autocomplete

Key Takeaways

  • Video streaming is dominated by storage and bandwidth costs. Every architectural decision (codec choice, storage tiering, CDN caching, lazy transcoding) should be evaluated through a cost lens.
  • The transcoding pipeline as a DAG enables massive parallelism. Splitting video into segments and encoding each independently reduces processing time from hours to minutes.
  • Adaptive bitrate streaming (HLS/DASH) is essential for smooth playback. The player dynamically adjusts quality based on network conditions, preventing buffering while maximizing visual quality.
  • CDN is not optional: it is a core architectural component. At scale, 95%+ of all video bytes should be served from edge caches, not the origin. Pre-warm popular content and use tiered caching for the long tail.
  • Separate the upload path from the playback path entirely. They have different availability requirements, scaling characteristics, and failure modes. Playback must be 99.99% available; upload can tolerate occasional delays.

Chapter Check-Up

Quick quiz to reinforce what you just learned.