LogStream is building a centralized log management platform where engineering teams send application logs for search, alerting, and analysis. Features:
- Log ingestion - accept logs via HTTP API, syslog, and agents installed on servers. Each log entry has timestamp, severity, message, and structured metadata (service name, host, trace ID).•Full-text search - search across all logs by keyword, severity, service, and time range. Results in < 3 seconds even when scanning billions of entries.•Live tail - stream new log entries in real time (like `tail -f`) filtered by service or keyword.•Alerting - define alert rules (e.g., "alert if error count > 100 in 5 minutes for service=payments"). Notify via Slack, email, or PagerDuty.•Retention & archival - hot storage for 7 days (fast search), warm storage for 30 days (slower search), cold archive for 1 year (restore-on-demand).•Log patterns - automatically detect and group similar log messages into patterns ("Connection timeout from [IP]" appears 50,000 times today).
Ingest 1 TB of logs per day from 200 services across 5,000 servers.
Design a centralized logging system (like Datadog Logs) that ingests, indexes, and searches 1 TB of logs per day. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.
Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.
Reference flow: Web Clients -> Load Balancer -> API Service -> Primary SQL DB -> Message Queue -> Background Workers -> Object Storage -> Search Index