Why Interviewers Ask for Estimates
Capacity estimation is not about getting the exact number. Interviewers want to see three things:
- Structured thinking: Can you break a complex problem into smaller, manageable pieces?
- Reasonable assumptions: Do you know the order of magnitude of common system metrics?
- Design implications: Can you translate numbers into architectural decisions? For example, if your write QPS is 50,000, a single SQL database won't suffice, and you need sharding or a NoSQL store.
Estimates drive design choices. A system serving 100 users per day looks completely different from one serving 100 million. By doing the math early, you avoid over-engineering small systems and under-engineering large ones.
Numbers Every Engineer Should Know
These reference numbers, originally popularized by Jeff Dean, form the foundation of all back-of-the-envelope calculations. You do not need to memorize every number, but you should know their order of magnitude.
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | 0.5 ns | Fastest memory access |
| L2 cache reference | 7 ns | ~14x L1 |
| Main memory (RAM) reference | 100 ns | ~200x L1 |
| SSD random read | 150 μs | ~1,500x RAM |
| HDD random read (seek) | 10 ms | ~100,000x RAM |
| Send 1 KB over 1 Gbps network | 10 μs | Within same data center |
| Round trip within same data center | 0.5 ms | Network + processing |
| Round trip CA to Netherlands | 150 ms | Cross-continent |
| Read 1 MB sequentially from RAM | 250 μs | |
| Read 1 MB sequentially from SSD | 1 ms | 4x RAM |
| Read 1 MB sequentially from HDD | 20 ms | 80x RAM |
| Disk seek | 10 ms | Mechanical delay |
| Compress 1 KB with Snappy | 3 μs | Fast compression |
| Mutex lock/unlock | 25 ns |
Storage and Scale Reference
| Unit | Approximate Value | Useful For |
|---|---|---|
| 1 KB | 1,000 bytes | A small JSON payload, a tweet |
| 1 MB | 1,000 KB | A high-res photo, a short MP3 |
| 1 GB | 1,000 MB | A movie, ~300 photos |
| 1 TB | 1,000 GB | A large database table |
| 1 PB | 1,000 TB | Entire company data warehouse |
| Seconds in a day | ~86,400 (~105) | QPS calculations |
| Seconds in a month | ~2.5 million (~2.5 × 106) | Monthly volume |
| Seconds in a year | ~31.5 million (~3 × 107) | Annual volume |
The Estimation Framework
Every capacity estimation follows the same four-step workflow. Practice this order so it becomes second nature in interviews.
Estimate Traffic (QPS)
Start with Daily Active Users (DAU). Multiply by actions per user per day. Divide by 86,400 seconds to get average queries per second (QPS). Multiply by a peak factor (typically 2x to 5x) for peak QPS.
Estimate Storage
Calculate the size of a single record or object. Multiply by the number of new records per day. Multiply by the retention period (e.g., 5 years). Add overhead for indexes, replication, and backups.
Estimate Bandwidth
Multiply average request/response size by QPS to get bytes per second. Calculate separately for incoming (write) and outgoing (read) traffic, as reads usually dominate.
Estimate Memory (Cache)
Apply the 80/20 rule: 20% of data generates 80% of reads. Cache the top 20% of daily read requests. Memory needed = 20% × daily read volume × average object size.
Traffic Estimation: DAU to QPS
The fundamental formula is:
Average QPS = (DAU × actions_per_user_per_day) / seconds_in_a_day
Peak QPS = Average QPS × peak_factor
Example:
DAU = 100 million
Actions per user per day = 10 (reads) + 2 (writes) = 12
Average QPS = 100M × 12 / 86,400 ≈ 13,900 QPS
Peak QPS (3x) ≈ 42,000 QPS
Always separate reads from writes, because they have different architectural implications. A system that is 90% reads and 10% writes benefits from read replicas and heavy caching. A system that is write-heavy needs sharding, write-ahead logs, and possibly eventual consistency.
Storage Estimation
Estimate per-record size, then scale up:
Record size = sum of all field sizes
user_id: 8 bytes (bigint)
content: 250 bytes (avg text)
timestamp: 8 bytes
metadata: 50 bytes
Total: ~316 bytes ≈ round to 500 bytes (with overhead)
Daily new records = DAU × writes_per_user = 100M × 2 = 200M
Daily storage = 200M × 500 bytes = 100 GB/day
Annual storage = 100 GB × 365 = 36.5 TB/year
5-year storage = 36.5 TB × 5 = 182.5 TB
With 3x replication: 182.5 TB × 3 ≈ 550 TB
Bandwidth Estimation
Incoming bandwidth (writes):
Write QPS = 2,300
Average write payload = 500 bytes
Bandwidth = 2,300 × 500 = 1.15 MB/s
Outgoing bandwidth (reads):
Read QPS = 11,600
Average response size = 2 KB (record + metadata)
Bandwidth = 11,600 × 2 KB = 23.2 MB/s
Memory Estimation (Caching)
The 80/20 rule (Pareto principle) states that roughly 80% of requests access 20% of the data. Caching this hot subset dramatically reduces database load.
Daily read requests = Read QPS × 86,400 = 11,600 × 86,400 ≈ 1 billion
Average response size = 2 KB
Total daily read data = 1B × 2 KB = 2 TB
Cache 20% of daily reads:
Cache memory = 0.2 × 2 TB = 400 GB
This fits in a cluster of ~10 machines with 64 GB RAM each,
or a few Redis instances with high-memory configurations.
Worked Example 1: URL Shortener
Assumptions
- 100 million new URLs shortened per month
- Read-to-write ratio: 100:1 (URLs are created once, read many times)
- Each shortened URL record: ~500 bytes (short code, original URL, creation timestamp, user ID, expiry)
- Retention: 5 years
=== Traffic ===
Write QPS = 100M / (30 × 86,400) ≈ 100M / 2.6M ≈ 40 QPS
Read QPS = 40 × 100 = 4,000 QPS
Peak read QPS (3x) = 12,000 QPS
=== Storage ===
Records over 5 years = 100M × 12 × 5 = 6 billion
Storage = 6B × 500 bytes = 3 TB
With replication (3x) = 9 TB
=== Bandwidth ===
Write: 40 × 500 bytes = 20 KB/s (negligible)
Read: 4,000 × 500 bytes = 2 MB/s
=== Cache ===
Daily reads = 4,000 × 86,400 ≈ 345 million
Cache 20%: 69M × 500 bytes ≈ 35 GB
Fits in a single Redis instance.
Worked Example 2: Twitter-like Feed
Assumptions
- 300 million Monthly Active Users (MAU), 150 million DAU
- Each user posts 2 tweets/day on average
- Each user reads their timeline 10 times/day, 20 tweets per load
- Average tweet: 280 characters (~300 bytes) + metadata (~200 bytes) = 500 bytes
- 10% of tweets include media (average image: 200 KB)
=== Traffic ===
Tweet writes = 150M × 2 / 86,400 ≈ 3,500 write QPS
Timeline reads = 150M × 10 / 86,400 ≈ 17,400 read QPS
Peak (3x): ~52,000 read QPS
=== Storage (text only, per year) ===
Daily tweets = 150M × 2 = 300M
Daily text storage = 300M × 500 bytes = 150 GB/day
Annual text = 150 GB × 365 ≈ 55 TB/year
=== Storage (media, per year) ===
Daily media tweets = 300M × 0.10 = 30M
Daily media storage = 30M × 200 KB = 6 TB/day
Annual media = 6 TB × 365 ≈ 2.2 PB/year
(Media goes to object storage like S3, not database)
=== Cache ===
Daily timeline reads = 17,400 QPS × 86,400 ≈ 1.5 billion
Each timeline fetch returns ~20 tweets × 500 bytes = 10 KB
Cache 20% of unique data: ~300M × 10 KB ≈ 3 TB
Distributed across ~50 cache machines.
Worked Example 3: YouTube-like Video Platform
Assumptions
- 2 billion MAU, 800 million DAU
- Each user watches 5 videos/day
- 500,000 new videos uploaded per day
- Average video: 300 MB (after transcoding to multiple resolutions)
- Average video metadata: 5 KB
=== Traffic ===
Video views = 800M × 5 / 86,400 ≈ 46,000 read QPS
Video uploads = 500,000 / 86,400 ≈ 6 upload QPS
=== Storage (per day) ===
Video storage = 500,000 × 300 MB = 150 TB/day
Annual video storage = 150 TB × 365 ≈ 55 PB/year
(Stored in distributed object storage, CDN-served)
Metadata storage = 500,000 × 5 KB = 2.5 GB/day (trivial)
=== Bandwidth ===
Assume average video stream bitrate: 5 Mbps
Concurrent viewers (peak): ~10 million
Peak bandwidth = 10M × 5 Mbps = 50 Tbps
(This is why CDNs are essential for video platforms)
Common Mistakes
- Confusing MB and Mb: 1 byte = 8 bits. A 100 Mbps network link carries 12.5 MB/s. Always clarify units.
- Forgetting replication: Real systems replicate data 3x (or more) for durability. Your raw storage number should be multiplied accordingly.
- Ignoring peak vs. average: Systems must handle peak load, not just average. Use a 2x-5x peak factor.
- Over-precision: Saying "13,888.89 QPS" is false precision. Say "~14,000 QPS" or "~14K QPS." Round aggressively.
- Forgetting indexes and overhead: Database indexes, filesystem metadata, and encoding overhead can add 20-50% to raw data size.
- Not separating reads and writes: They have very different performance profiles and drive different architectural choices.
Presentation Tips for Interviews
- State your assumptions clearly before calculating. "Let's assume 100 million DAU" is better than jumping straight into math.
- Round aggressively. Use powers of 10. 86,400 seconds/day becomes ~100,000 for quick math. 2.5 million seconds/month becomes ~2.5M.
- Write on the whiteboard. Organize your numbers visually: Traffic | Storage | Bandwidth | Cache in four columns.
- Sanity-check your results. If your URL shortener needs 50 PB of storage, something is wrong. Compare against known real-world systems.
- Connect estimates to design decisions. "Since we need 12K read QPS and only 40 write QPS, the system is read-heavy. Let's add a cache layer and read replicas."
Quick Reference: Powers of Two
| Power | Exact Value | Approximate | Name |
|---|---|---|---|
| 210 | 1,024 | ~1 Thousand | 1 KB |
| 220 | 1,048,576 | ~1 Million | 1 MB |
| 230 | 1,073,741,824 | ~1 Billion | 1 GB |
| 240 | 1,099,511,627,776 | ~1 Trillion | 1 TB |
| 250 | ~1.13 × 1015 | ~1 Quadrillion | 1 PB |
Summary
Capacity estimation is a structured, repeatable skill. Follow the four-step framework (traffic, storage, bandwidth, cache), memorize the key reference numbers, round aggressively, and always connect your estimates back to design decisions. The goal is not perfection, it is demonstrating that you can reason about scale systematically.