HardEnterprise

Design Google Maps

DatabasesCachingCDNGeo DistributionStorageAnalytics

Problem Statement

Design the architecture for Google Maps - the world's most-used mapping platform with 1 billion monthly active users across 220+ countries. Your design must cover:

- Map tile rendering - the world map is divided into tiles at 20+ zoom levels. At max zoom, there are billions of tiles. Pre-render popular tiles; render less-viewed tiles on-demand. Serve tiles from CDN edge nodes for < 100 ms load time.Directions & routing - given origin and destination, compute the optimal route using a road network graph with hundreds of millions of edges. Support driving, walking, cycling, and transit. The routing algorithm must return results in < 1 second, even for cross-country routes.Real-time navigation - turn-by-turn directions with live rerouting when the driver goes off-course. Requires continuous GPS tracking and sub-second route recalculation.Live traffic - aggregate GPS data from millions of active Android/iOS users, analyze speed on every road segment, and update the traffic layer on the map. Detect accidents, road closures, and construction.Place search (POI) - search for businesses, restaurants, gas stations, etc. by name, category, or "near me." Results ranked by relevance, distance, and ratings.Offline maps - users can download entire regions (e.g., "New York City") for offline use (~200 MB per metro area).Street View - serve panoramic street-level imagery with smooth transitions between positions.

The core challenge is the combination of massive geospatial data, real-time graph algorithms, and global-scale serving.

What You'll Learn

Design Google Maps - map tile rendering, real-time navigation, traffic prediction, and place search for 1 B+ users. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesCachingCDNGeo DistributionStorageAnalytics

Constraints

Monthly active users1,000,000,000+
Map tiles (total)Billions
Road network edgesHundreds of millions
Routing latency< 1 second
Tile load time< 100 ms
GPS data points ingested/sec~10,000,000
Traffic update frequencyEvery 1-2 minutes
Availability target99.99%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Design Google Maps - map tile rendering, real-time navigation, traffic prediction, and place search for 1 B+ users.
  • Design for a peak load target around 69,444 RPS (including burst headroom).
  • Monthly active users: 1,000,000,000+
  • Map tiles (total): Billions
  • Road network edges: Hundreds of millions
  • Routing latency: < 1 second
  • Tile load time: < 100 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • Caching: Put cache on hot read paths first and pick cache-aside or write-through explicitly.
  • CDN: Serve static and cacheable content from edge and keep origin strictly for misses and dynamic requests.
  • Geo Distribution: Route users to nearest region/edge while keeping write-consistency boundaries explicit.
  • Storage: Use object storage for large blobs and keep metadata/authorization separate in the API tier.
  • Analytics: Maintain separate OLTP and analytics paths; stream events into a warehouse/time-series layer.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Bound staleness with TTL + invalidation hooks for critical entities.
  • Define cache keys and purge workflows before launch to avoid stale/global outages.
  • Design region failover and data residency controls as first-class requirements.
  • Enforce lifecycle policies, retention tiers, and checksum validation.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • Caching: Higher hit rate cuts latency/cost, but stale data and invalidation bugs become primary risks.
  • CDN: Long TTL improves latency/cost; short TTL improves freshness.
  • Geo Distribution: Global latency improves, but cross-region consistency and operations become harder.
  • Storage: Object storage is cheap and durable, but random low-latency reads are weaker than databases/caches.

Practical Notes

  • Pre-render the top 5 zoom levels and the most-viewed tiles (major cities). Use on-demand rendering with caching for long-tail tiles.
  • Routing: use Contraction Hierarchies or A* with landmarks to speed up shortest-path queries on massive graphs. Pre-process the graph offline.
  • Live traffic: partition roads by S2 cell. Aggregate speed data from GPS traces per road segment. Store in a time-series structure.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Web Clients -> DNS -> CDN Edge -> Load Balancer -> Core Service -> Primary SQL DB -> Read Model DB -> Redis Cache

Design strengths

  • Cache sits on the read path to absorb repeated queries and keep DB pressure stable.
  • Analytics pipeline is separated from OLTP path to avoid reporting workloads impacting transactions.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.