Public Solution

Content Delivery Network

Q: What should I change first if traffic doubles?

Profile the bottleneck first, then scale the hot path component (usually compute, cache, or read path) before adding new system layers.

Q: Why is CDN emphasized in this solution?

It is the highest-leverage topic for this challenge constraints and directly improves score-impacting metrics like latency, availability, or resilience.

Q: How do I validate this architecture quickly?

Run the same challenge in the simulator, compare score breakdown metrics, and then test one tradeoff change at a time.

Content Delivery Network solution gives a production-minded baseline for this prompt. You get a concise requirements recap, a component-by-component architecture breakdown, explicit tradeoffs for latency, availability, cost, and complexity, plus failure mitigations and scoring rationale so you can benchmark your own design quickly.

MediumCdnCachingStorageMonitoring

View challenge prompt Explore CDN topic hub Guided lab: Your First System

Requirements Recap

Requirement	Target
Edge locations	~50 globally
Requests/day (total)	~1,000,000,000
Cached content	~100 TB
Cache hit ratio target	> 95%
Purge propagation	< 30 seconds globally
Availability target	99.99%

Architecture Breakdown (Component-by-Component)

1. Web Clients
Generates user traffic and receives responses.
Acts as an entry layer that routes traffic into the rest of the system.
2. DNS
Resolves domain names to reachable service endpoints.
Bridges 1 incoming flow to 1 downstream dependency.
3. CDN Edge
Serves cacheable and static content from edge locations.
Bridges 1 incoming flow to 1 downstream dependency.
4. Load Balancer
Distributes requests across healthy backend instances.
Bridges 1 incoming flow to 1 downstream dependency.
5. API Service
Runs core business logic and orchestrates downstream calls.
Bridges 1 incoming flow to 5 downstream dependencies.
6. Redis Cache
Stores hot data to reduce origin read latency.
Bridges 1 incoming flow to 1 downstream dependency.
7. Monitoring
Collects service health and operational telemetry.
Acts as a sink or system-of-record endpoint in the architecture flow.
8. Primary SQL DB
Persists relational data with transactional guarantees.
Bridges 2 incoming flows to 1 downstream dependency.
9. Log Aggregator
Centralizes logs for debugging and incident response.
Bridges 1 incoming flow to 1 downstream dependency.
10. Read Model DB
Stores high-scale data with flexible schema and throughput.
Acts as a sink or system-of-record endpoint in the architecture flow.
11. Object Storage
Stores large files and media objects durably.
Acts as a sink or system-of-record endpoint in the architecture flow.

Tradeoffs (Latency / Availability / Cost / Complexity)

Decision	Latency	Availability	Cost	Complexity
Keep the request path focused on core business operations	Shorter synchronous path keeps average response time stable	Fewer inline dependencies reduce immediate failure blast radius	Avoids unnecessary infrastructure in the first rollout	Lower coordination overhead for small teams
Push cacheable responses to edge locations	Faster global response time for static and hot assets	Edge cache can mask origin incidents temporarily	Lower origin egress and compute, with CDN transfer fees	Cache key and purge strategy must be explicit
Cache hot reads in front of the primary data store	Lower median and tail latency on repeated reads	Absorbs origin pressure during read spikes	Adds cache infra spend but reduces database scaling pressure	Requires TTL and invalidation discipline

Failure Modes and Mitigations

Failure mode: Cache stampede after hot-key expiry overloads the database
Mitigation: Use request coalescing, jittered TTLs, and stale-while-revalidate for hot keys.
Failure mode: Blind spots delay incident detection and increase mean time to recovery
Mitigation: Track golden signals, error budgets, and service-specific runbooks with alerts.

Why This Scores Well

Availability (35%): Redundant routing and data paths reduce single points of failure under burst traffic.
Latency (20%): The design keeps hot reads close to users and reduces expensive origin round-trips.
Resilience (25%): Asynchronous buffering, observability, and service boundaries isolate faults and improve recovery.
Cost Efficiency (10%) + Simplicity (10%): Higher complexity is scoped to requirements that actually demand scale or stronger fault tolerance.

Next Step

Validate this architecture by solving the prompt yourself, then practice the highest-leverage component in a guided lab and topic hub.

Try solving Practice this component CDN topic hub

FAQ

What should I change first if traffic doubles?
Profile the bottleneck first, then scale the hot path component (usually compute, cache, or read path) before adding new system layers.
Why is CDN emphasized in this solution?
It is the highest-leverage topic for this challenge constraints and directly improves score-impacting metrics like latency, availability, or resilience.
How do I validate this architecture quickly?
Run the same challenge in the simulator, compare score breakdown metrics, and then test one tradeoff change at a time.

Related Reading

CDN in System Design: When and How to Use a Content Delivery Network

Learn when a CDN actually helps, how edge caching works, and the cache-key and purge decisions that matter in real architectures.

Content Delivery Network

MediumCdnCachingStorageMonitoring

Requirement

Target

Edge locations

~50 globally

Requests/day (total)

~1,000,000,000

Cached content

~100 TB

Cache hit ratio target

> 95%

Purge propagation

< 30 seconds globally

Availability target

99.99%

Architecture Breakdown (Component-by-Component)

1. Web Clients

Generates user traffic and receives responses.

Acts as an entry layer that routes traffic into the rest of the system.

2. DNS

Resolves domain names to reachable service endpoints.

Bridges 1 incoming flow to 1 downstream dependency.

3. CDN Edge

Serves cacheable and static content from edge locations.

Bridges 1 incoming flow to 1 downstream dependency.

4. Load Balancer

Distributes requests across healthy backend instances.

Bridges 1 incoming flow to 1 downstream dependency.

5. API Service

Runs core business logic and orchestrates downstream calls.

Bridges 1 incoming flow to 5 downstream dependencies.

6. Redis Cache

Stores hot data to reduce origin read latency.

Bridges 1 incoming flow to 1 downstream dependency.

7. Monitoring

Collects service health and operational telemetry.

Acts as a sink or system-of-record endpoint in the architecture flow.

8. Primary SQL DB

Persists relational data with transactional guarantees.

Bridges 2 incoming flows to 1 downstream dependency.

9. Log Aggregator

Centralizes logs for debugging and incident response.

Bridges 1 incoming flow to 1 downstream dependency.

10. Read Model DB

Stores high-scale data with flexible schema and throughput.

Acts as a sink or system-of-record endpoint in the architecture flow.

11. Object Storage

Stores large files and media objects durably.

Acts as a sink or system-of-record endpoint in the architecture flow.

Tradeoffs (Latency / Availability / Cost / Complexity)

Decision	Latency	Availability	Cost	Complexity
Keep the request path focused on core business operations	Shorter synchronous path keeps average response time stable	Fewer inline dependencies reduce immediate failure blast radius	Avoids unnecessary infrastructure in the first rollout	Lower coordination overhead for small teams
Push cacheable responses to edge locations	Faster global response time for static and hot assets	Edge cache can mask origin incidents temporarily	Lower origin egress and compute, with CDN transfer fees	Cache key and purge strategy must be explicit
Cache hot reads in front of the primary data store	Lower median and tail latency on repeated reads	Absorbs origin pressure during read spikes	Adds cache infra spend but reduces database scaling pressure	Requires TTL and invalidation discipline

Failure Modes and Mitigations

Failure mode: Cache stampede after hot-key expiry overloads the database

Mitigation: Use request coalescing, jittered TTLs, and stale-while-revalidate for hot keys.

Failure mode: Blind spots delay incident detection and increase mean time to recovery

Mitigation: Track golden signals, error budgets, and service-specific runbooks with alerts.

Why This Scores Well

Availability (35%): Redundant routing and data paths reduce single points of failure under burst traffic.

Latency (20%): The design keeps hot reads close to users and reduces expensive origin round-trips.

Resilience (25%): Asynchronous buffering, observability, and service boundaries isolate faults and improve recovery.

Cost Efficiency (10%) + Simplicity (10%): Higher complexity is scoped to requirements that actually demand scale or stronger fault tolerance.

FAQ

What should I change first if traffic doubles?

Profile the bottleneck first, then scale the hot path component (usually compute, cache, or read path) before adding new system layers.

Why is CDN emphasized in this solution?

It is the highest-leverage topic for this challenge constraints and directly improves score-impacting metrics like latency, availability, or resilience.

How do I validate this architecture quickly?

Run the same challenge in the simulator, compare score breakdown metrics, and then test one tradeoff change at a time.