HardEnterprise

Design Zoom

WebSocketsCDNGeo DistributionDatabasesMonitoringAuth

Problem Statement

Design the architecture for Zoom - the world's leading video conferencing platform handling 300 million daily meeting participants with meetings of up to 1,000 video participants. Your design must cover:

- Real-time video & audio - ultra-low latency video conferencing with < 150 ms end-to-end latency. Support multiple video layouts (speaker view, gallery view with up to 49 tiles). Audio mixing for hundreds of participants.Selective Forwarding Unit (SFU) - rather than peer-to-peer (which doesn't scale beyond ~5 participants), use an SFU architecture where each participant sends their stream to the server, which selectively forwards relevant streams to each receiver.Simulcast & SVC - each participant sends multiple quality layers (high, medium, low). The SFU selects the appropriate layer per receiver based on their bandwidth and viewport size.Screen sharing - capture and stream a desktop/window/tab at up to 30 fps with high text clarity.Recording - server-side recording that composites all video/audio streams into a single MP4 file. Cloud recording stored in object storage.Chat & reactions - in-meeting text chat, emoji reactions, hand raising, polls, and Q&A.Breakout rooms - split a meeting into sub-meetings and merge back. Participants and media streams are rerouted dynamically.Waiting room & security - meeting passwords, waiting room admission, end-to-end encryption option (E2EE), and host controls (mute all, remove participant).Global infrastructure - meetings should be routed to the nearest data center. Cross-continent meetings need media relay servers to minimize latency.

The core challenge is real-time media routing at scale with ultra-low latency across the globe.

What You'll Learn

Design Zoom's video conferencing platform - real-time video/audio, screen sharing, recording, and breakout rooms for 300 M+ daily participants. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

WebSocketsCDNGeo DistributionDatabasesMonitoringAuth

Constraints

Daily meeting participants300,000,000
Peak concurrent meetings~10,000,000
Max participants/meeting1,000
Video latency (end-to-end)< 150 ms
Audio latency< 100 ms
Gallery view tilesUp to 49
Data centers15+ globally
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design Zoom's video conferencing platform - real-time video/audio, screen sharing, recording, and breakout rooms for 300 M+ daily participants.
  • Design for a peak load target around 80,000 RPS (including burst headroom).
  • Daily meeting participants: 300,000,000
  • Peak concurrent meetings: ~10,000,000
  • Max participants/meeting: 1,000
  • Video latency (end-to-end): < 150 ms
  • Audio latency: < 100 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • WebSockets: Use persistent connection gateways and decouple fanout via pub/sub or queues.
  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Geo Distribution: Route users to nearest region/edge while keeping write-consistency boundaries explicit.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Monitoring: Instrument golden signals (latency, traffic, errors, saturation) per tier and per tenant/domain.
  • Auth: Centralize identity verification and keep authorization checks close to domain resources.

4) Reliability and Failure Strategy

  • Track connection churn, backpressure, and session resumption behavior.
  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Design region failover and data residency controls as first-class requirements.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Alert on user-impact SLOs, not only infrastructure metrics.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • WebSockets: WebSockets reduce interaction latency but complicate scaling and state management.
  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Geo Distribution: Global latency improves, but cross-region consistency and operations become harder.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Monitoring: Deep observability speeds incident response but raises ingestion and tooling costs.

Practical Notes

  • SFU (Selective Forwarding Unit) per meeting: receives N streams, forwards up to N-1 to each participant. Only the active speaker + pinned videos are forwarded at high quality.
  • Simulcast: each sender encodes 3 quality layers. The SFU picks the right layer per receiver - low quality for thumbnail tiles, high for the active speaker.
  • Route meeting to the data center closest to the majority of participants. For geographically distributed meetings, use media relay bridges between data centers.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Mobile Clients -> DNS -> CDN Edge -> Load Balancer -> API Gateway -> Core Service -> Auth Service -> Primary NoSQL DB

Design strengths

  • Monitoring and logs are wired in from day one for rapid incident triage.
  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.