Back-of-the-Envelope Estimation | System Design Course

Why Interviewers Ask for Estimates

Capacity estimation is not about getting the exact number. Interviewers want to see three things:

Structured thinking: Can you break a complex problem into smaller, manageable pieces?
Reasonable assumptions: Do you know the order of magnitude of common system metrics?
Design implications: Can you translate numbers into architectural decisions? For example, if your write QPS is 50,000, a single SQL database won't suffice, and you need sharding or a NoSQL store.

Estimates drive design choices. A system serving 100 users per day looks completely different from one serving 100 million. By doing the math early, you avoid over-engineering small systems and under-engineering large ones.

Numbers Every Engineer Should Know

These reference numbers, originally popularized by Jeff Dean, form the foundation of all back-of-the-envelope calculations. You do not need to memorize every number, but you should know their order of magnitude.

Operation	Latency	Notes
L1 cache reference	0.5 ns	Fastest memory access
L2 cache reference	7 ns	~14x L1
Main memory (RAM) reference	100 ns	~200x L1
SSD random read	150 μs	~1,500x RAM
HDD random read (seek)	10 ms	~100,000x RAM
Send 1 KB over 1 Gbps network	10 μs	Within same data center
Round trip within same data center	0.5 ms	Network + processing
Round trip CA to Netherlands	150 ms	Cross-continent
Read 1 MB sequentially from RAM	250 μs
Read 1 MB sequentially from SSD	1 ms	4x RAM
Read 1 MB sequentially from HDD	20 ms	80x RAM
Disk seek	10 ms	Mechanical delay
Compress 1 KB with Snappy	3 μs	Fast compression
Mutex lock/unlock	25 ns

Storage and Scale Reference

Unit	Approximate Value	Useful For
1 KB	1,000 bytes	A small JSON payload, a tweet
1 MB	1,000 KB	A high-res photo, a short MP3
1 GB	1,000 MB	A movie, ~300 photos
1 TB	1,000 GB	A large database table
1 PB	1,000 TB	Entire company data warehouse
Seconds in a day	~86,400 (~10⁵)	QPS calculations
Seconds in a month	~2.5 million (~2.5 × 10⁶)	Monthly volume
Seconds in a year	~31.5 million (~3 × 10⁷)	Annual volume

The Estimation Framework

Every capacity estimation follows the same four-step workflow. Practice this order so it becomes second nature in interviews.

Estimate Traffic (QPS)

Start with Daily Active Users (DAU). Multiply by actions per user per day. Divide by 86,400 seconds to get average queries per second (QPS). Multiply by a peak factor (typically 2x to 5x) for peak QPS.

Estimate Storage

Calculate the size of a single record or object. Multiply by the number of new records per day. Multiply by the retention period (e.g., 5 years). Add overhead for indexes, replication, and backups.

Estimate Bandwidth

Multiply average request/response size by QPS to get bytes per second. Calculate separately for incoming (write) and outgoing (read) traffic, as reads usually dominate.

Estimate Memory (Cache)

Apply the 80/20 rule: 20% of data generates 80% of reads. Cache the top 20% of daily read requests. Memory needed = 20% × daily read volume × average object size.

Traffic Estimation: DAU to QPS

The fundamental formula is:

Average QPS = (DAU × actions_per_user_per_day) / seconds_in_a_day

Peak QPS   = Average QPS × peak_factor

Example:
  DAU = 100 million
  Actions per user per day = 10 (reads) + 2 (writes) = 12
  Average QPS = 100M × 12 / 86,400 ≈ 13,900 QPS
  Peak QPS (3x) ≈ 42,000 QPS

Always separate reads from writes, because they have different architectural implications. A system that is 90% reads and 10% writes benefits from read replicas and heavy caching. A system that is write-heavy needs sharding, write-ahead logs, and possibly eventual consistency.

Storage Estimation

Estimate per-record size, then scale up:

Record size = sum of all field sizes
  user_id:     8 bytes (bigint)
  content:   250 bytes (avg text)
  timestamp:   8 bytes
  metadata:   50 bytes
  Total:     ~316 bytes ≈ round to 500 bytes (with overhead)

Daily new records = DAU × writes_per_user = 100M × 2 = 200M
Daily storage = 200M × 500 bytes = 100 GB/day
Annual storage = 100 GB × 365 = 36.5 TB/year
5-year storage = 36.5 TB × 5 = 182.5 TB

With 3x replication: 182.5 TB × 3 ≈ 550 TB

Bandwidth Estimation

Incoming bandwidth (writes):
  Write QPS = 2,300
  Average write payload = 500 bytes
  Bandwidth = 2,300 × 500 = 1.15 MB/s

Outgoing bandwidth (reads):
  Read QPS = 11,600
  Average response size = 2 KB (record + metadata)
  Bandwidth = 11,600 × 2 KB = 23.2 MB/s

Memory Estimation (Caching)

The 80/20 rule (Pareto principle) states that roughly 80% of requests access 20% of the data. Caching this hot subset dramatically reduces database load.

Daily read requests = Read QPS × 86,400 = 11,600 × 86,400 ≈ 1 billion
Average response size = 2 KB
Total daily read data = 1B × 2 KB = 2 TB

Cache 20% of daily reads:
  Cache memory = 0.2 × 2 TB = 400 GB

This fits in a cluster of ~10 machines with 64 GB RAM each,
or a few Redis instances with high-memory configurations.

Worked Example 1: URL Shortener

Assumptions

100 million new URLs shortened per month
Read-to-write ratio: 100:1 (URLs are created once, read many times)
Each shortened URL record: ~500 bytes (short code, original URL, creation timestamp, user ID, expiry)
Retention: 5 years

=== Traffic ===
Write QPS = 100M / (30 × 86,400) ≈ 100M / 2.6M ≈ 40 QPS
Read QPS  = 40 × 100 = 4,000 QPS
Peak read QPS (3x) = 12,000 QPS

=== Storage ===
Records over 5 years = 100M × 12 × 5 = 6 billion
Storage = 6B × 500 bytes = 3 TB
With replication (3x) = 9 TB

=== Bandwidth ===
Write: 40 × 500 bytes = 20 KB/s (negligible)
Read:  4,000 × 500 bytes = 2 MB/s

=== Cache ===
Daily reads = 4,000 × 86,400 ≈ 345 million
Cache 20%: 69M × 500 bytes ≈ 35 GB
Fits in a single Redis instance.

Worked Example 2: Twitter-like Feed

Assumptions

300 million Monthly Active Users (MAU), 150 million DAU
Each user posts 2 tweets/day on average
Each user reads their timeline 10 times/day, 20 tweets per load
Average tweet: 280 characters (~300 bytes) + metadata (~200 bytes) = 500 bytes
10% of tweets include media (average image: 200 KB)

=== Traffic ===
Tweet writes = 150M × 2 / 86,400 ≈ 3,500 write QPS
Timeline reads = 150M × 10 / 86,400 ≈ 17,400 read QPS
Peak (3x): ~52,000 read QPS

=== Storage (text only, per year) ===
Daily tweets = 150M × 2 = 300M
Daily text storage = 300M × 500 bytes = 150 GB/day
Annual text = 150 GB × 365 ≈ 55 TB/year

=== Storage (media, per year) ===
Daily media tweets = 300M × 0.10 = 30M
Daily media storage = 30M × 200 KB = 6 TB/day
Annual media = 6 TB × 365 ≈ 2.2 PB/year
(Media goes to object storage like S3, not database)

=== Cache ===
Daily timeline reads = 17,400 QPS × 86,400 ≈ 1.5 billion
Each timeline fetch returns ~20 tweets × 500 bytes = 10 KB
Cache 20% of unique data: ~300M × 10 KB ≈ 3 TB
Distributed across ~50 cache machines.

Worked Example 3: YouTube-like Video Platform

Assumptions

2 billion MAU, 800 million DAU
Each user watches 5 videos/day
500,000 new videos uploaded per day
Average video: 300 MB (after transcoding to multiple resolutions)
Average video metadata: 5 KB

=== Traffic ===
Video views = 800M × 5 / 86,400 ≈ 46,000 read QPS
Video uploads = 500,000 / 86,400 ≈ 6 upload QPS

=== Storage (per day) ===
Video storage = 500,000 × 300 MB = 150 TB/day
Annual video storage = 150 TB × 365 ≈ 55 PB/year
(Stored in distributed object storage, CDN-served)

Metadata storage = 500,000 × 5 KB = 2.5 GB/day (trivial)

=== Bandwidth ===
Assume average video stream bitrate: 5 Mbps
Concurrent viewers (peak): ~10 million
Peak bandwidth = 10M × 5 Mbps = 50 Tbps
(This is why CDNs are essential for video platforms)

Common Mistakes

Confusing MB and Mb: 1 byte = 8 bits. A 100 Mbps network link carries 12.5 MB/s. Always clarify units.
Forgetting replication: Real systems replicate data 3x (or more) for durability. Your raw storage number should be multiplied accordingly.
Ignoring peak vs. average: Systems must handle peak load, not just average. Use a 2x-5x peak factor.
Over-precision: Saying "13,888.89 QPS" is false precision. Say "~14,000 QPS" or "~14K QPS." Round aggressively.
Forgetting indexes and overhead: Database indexes, filesystem metadata, and encoding overhead can add 20-50% to raw data size.
Not separating reads and writes: They have very different performance profiles and drive different architectural choices.

Presentation Tips for Interviews

State your assumptions clearly before calculating. "Let's assume 100 million DAU" is better than jumping straight into math.
Round aggressively. Use powers of 10. 86,400 seconds/day becomes ~100,000 for quick math. 2.5 million seconds/month becomes ~2.5M.
Write on the whiteboard. Organize your numbers visually: Traffic | Storage | Bandwidth | Cache in four columns.
Sanity-check your results. If your URL shortener needs 50 PB of storage, something is wrong. Compare against known real-world systems.
Connect estimates to design decisions. "Since we need 12K read QPS and only 40 write QPS, the system is read-heavy. Let's add a cache layer and read replicas."

Quick Reference: Powers of Two

Power	Exact Value	Approximate	Name
2¹⁰	1,024	~1 Thousand	1 KB
2²⁰	1,048,576	~1 Million	1 MB
2³⁰	1,073,741,824	~1 Billion	1 GB
2⁴⁰	1,099,511,627,776	~1 Trillion	1 TB
2⁵⁰	~1.13 × 10¹⁵	~1 Quadrillion	1 PB

Summary

Capacity estimation is a structured, repeatable skill. Follow the four-step framework (traffic, storage, bandwidth, cache), memorize the key reference numbers, round aggressively, and always connect your estimates back to design decisions. The goal is not perfection, it is demonstrating that you can reason about scale systematically.

Capacity Estimation & Back-of-Envelope Math

Why Interviewers Ask for Estimates

Numbers Every Engineer Should Know

Storage and Scale Reference

The Estimation Framework

Estimate Traffic (QPS)

Estimate Storage

Estimate Bandwidth

Estimate Memory (Cache)

Traffic Estimation: DAU to QPS

Storage Estimation

Bandwidth Estimation

Memory Estimation (Caching)

Worked Example 1: URL Shortener

Assumptions

Worked Example 2: Twitter-like Feed

Assumptions

Worked Example 3: YouTube-like Video Platform

Assumptions

Common Mistakes

Presentation Tips for Interviews

Quick Reference: Powers of Two

Summary

Chapter Check-Up