HardRideShare · Part 3

RideShare 3 - Global Scale & Autonomous Fleet

Geo DistributionShardingMessage QueuesMonitoringConsistencyContainerization

This challenge builds on RideShare 2 - Multi-City & Surge Pricing. Complete it first for the best experience.

Problem Statement

ZipRide is now a global platform operating in 50 countries with a mixed fleet of human drivers and autonomous vehicles. The system must handle:

- 10 million rides per day across all regions.Autonomous vehicle telemetry - each AV streams 50+ sensor readings per second (location, speed, battery, obstacle detection). The platform must ingest, process, and act on this data in real time for safety-critical decisions.ML-based dispatch - a machine learning model predicts optimal driver/rider matches considering ETA, driver rating, vehicle type, and predicted demand 15 minutes into the future.Regulatory compliance - different countries have different data residency laws, licensing requirements, and payment processors.Five-nines availability (99.999%) - the platform handles safety-critical operations; downtime could leave riders stranded or autonomous vehicles without instructions.

This is the hardest system design problem: a globally distributed, real-time, safety-critical platform at extreme scale.

What You'll Learn

Operate in 50 countries with autonomous vehicles, ML-based dispatch, and five-nines uptime. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

Geo DistributionShardingMessage QueuesMonitoringConsistencyContainerization

Constraints

Daily rides~10,000,000
Countries50
AV telemetry events/sec~5,000,000
ML inference latency< 200 ms
Failover time< 10 seconds
Data retention7 years (regulatory)
Availability target99.999%
ApproachClick to expand

How to Approach

Clarify requirements: Need real-time surge pricing (demand > supply -> price increases). Need historical analytics for business intelligence. Operational DB cannot handle analytics queries.

Estimate scale: 10M rides/day globally. Analytics queries scan millions of rows -- would kill the operational DB if run there.

Pick components:

  • Stream processor for real-time demand/supply ratio computation (surge pricing)
  • Data warehouse (Redshift/BigQuery) for historical analytics -- separate from operational DB
  • Event stream (Kafka) from API to data warehouse via ETL worker

Key tradeoffs to discuss:

  • OLTP vs OLAP: operational DB optimized for row-level reads/writes; data warehouse optimized for column-scans across millions of rows
  • Surge pricing SLA: must compute and apply within 30 seconds of demand spike -- requires stream processing, not batch
  • ETL lag: data warehouse is 5-60 minutes behind operational DB -- acceptable for business reports, not for real-time features

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Add real-time surge pricing via a stream processor that computes demand/supply ratios continuously and writes results to a cache. The API reads surge multipliers from cache (<1ms) and applies them to fare estimates. Separately, an ETL worker streams trip events to a data warehouse for business analytics -- keeping analytical queries off the operational database entirely.