Guided Lab Brief

Real-Time Data Pipeline

Build an end-to-end data pipeline: ingest → process → store → analyze with streaming and batch processing.

Overview

Build an end-to-end data pipeline: ingest → process → store → analyze with streaming and batch processing.

Modern systems generate massive data streams: user clicks, transactions, sensor readings, logs.

You will build 6 architecture steps that model production dependencies.

You will run 1 failure experiment to observe bottlenecks and recovery behavior.

Success target: Data flows from IoT → processing → storage with minimal lag. Both real-time and batch paths working.

Trigger: Reduce Flink parallelism to 1 to create a processing bottleneck
Observe: Single-threaded Flink can't keep up with 5000 events/sec. Processing lag grows - real-time becomes minutes-delayed. Alerts fire late. Dashboards show stale data.