Distributed Job Scheduler

Problem Statement

CronCloud is building a managed job scheduling service (like a cloud-scale cron). Features:

- Schedule jobs - cron expressions ("every day at 3 AM"), one-time schedules, or interval-based ("every 15 minutes"). Jobs are HTTP webhook calls or message queue publishes.•Reliability - jobs must execute exactly once (or at-least-once with idempotency). No missed executions, even during deployments or node failures.•Distributed execution - an entire cluster of scheduler nodes that coordinate. If one node dies, another picks up its jobs.•Retries & dead letter - failed jobs retry with exponential backoff (up to 5 retries). After max retries, move to a dead-letter queue for manual inspection.•Job history - searchable log of every execution: start time, duration, status (success/failure), and output/error logs.•Alerting - notify via webhook or email when a job consistently fails or misses its schedule.

Handle 5 million scheduled jobs triggering 50 million executions per day.

What You'll Learn

Design a distributed cron/job scheduler that runs millions of scheduled tasks reliably with retries and monitoring. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesMessage QueuesMonitoringAPI Design

Constraints

Registered jobs~5,000,000

Executions per day~50,000,000

Schedule accuracyWithin 5 seconds of target time

Missed execution toleranceZero

Max retry attempts5

Job history retention30 days

Availability target99.99%

Learn the Concept

Databases Topic Hub Message Queues Topic Hub Monitoring Topic Hub API Design Topic Hub

Related guided labs:

Database Replication & Read Scaling NoSQL & Document Databases Schema Design Workshop

Problem Statement

What You'll Learn

Constraints

Interview-Ready Approach

1) Clarify Scope and SLOs

2) Capacity Planning Method

3) Architecture Decisions

4) Reliability and Failure Strategy

5) Validation Plan

6) Trade-offs to Call Out in Interviews

Practical Notes

Hints (4)

Learn the Concept

Practice Next