CronCloud is building a managed job scheduling service (like a cloud-scale cron). Features:
- Schedule jobs - cron expressions ("every day at 3 AM"), one-time schedules, or interval-based ("every 15 minutes"). Jobs are HTTP webhook calls or message queue publishes.•Reliability - jobs must execute exactly once (or at-least-once with idempotency). No missed executions, even during deployments or node failures.•Distributed execution - an entire cluster of scheduler nodes that coordinate. If one node dies, another picks up its jobs.•Retries & dead letter - failed jobs retry with exponential backoff (up to 5 retries). After max retries, move to a dead-letter queue for manual inspection.•Job history - searchable log of every execution: start time, duration, status (success/failure), and output/error logs.•Alerting - notify via webhook or email when a job consistently fails or misses its schedule.
Handle 5 million scheduled jobs triggering 50 million executions per day.
Design a distributed cron/job scheduler that runs millions of scheduled tasks reliably with retries and monitoring. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.
Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.
Reference flow: Web Clients -> Load Balancer -> API Gateway -> API Service -> Primary SQL DB -> Message Queue -> Background Workers -> Monitoring