Status Page Service

Problem Statement

UptimeBoard is building a hosted status page service for SaaS companies. Each customer gets a public page (e.g., `status.example.com`) showing:

- Component status - list of services (API, Dashboard, Database) with status indicators (operational, degraded, outage).•Uptime history - a 90-day uptime bar graph per component showing daily/hourly availability.•Incidents - admins create incident reports with updates ("Investigating" → "Identified" → "Monitoring" → "Resolved"). Subscribers get email/SMS notifications.•Scheduled maintenance - announce upcoming maintenance windows.•Health checks - automatic HTTP/TCP/ping checks every 60 seconds. Auto-create incidents when a check fails 3 times in a row.

Targeting 2,000 customers each with a public status page.

What You'll Learn

Design a status page service (like Statuspage.io) showing uptime, incident updates, and health checks. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesAPI DesignMonitoring

Constraints

Customer status pages~2,000

Health checks/minute~20,000

Status page load time< 500 ms

Incident notification delay< 2 minutes

Uptime data retention1 year

Availability target99.9% (must be higher than customers' own uptime!)

Learn the Concept

Databases Topic Hub API Design Topic Hub Monitoring Topic Hub

Related guided labs:

Database Replication & Read Scaling NoSQL & Document Databases Schema Design Workshop

Problem Statement

What You'll Learn

Constraints

Interview-Ready Approach

1) Clarify Scope and SLOs

2) Capacity Planning Method

3) Architecture Decisions

4) Reliability and Failure Strategy

5) Validation Plan

6) Trade-offs to Call Out in Interviews

Practical Notes

Hints (3)

Learn the Concept

Practice Next