Guided Lab Brief

Real-Time Data Pipeline

Build an end-to-end data pipeline: ingest → process → store → analyze with streaming and batch processing.

Overview

Build an end-to-end data pipeline: ingest → process → store → analyze with streaming and batch processing.

Modern systems generate massive data streams: user clicks, transactions, sensor readings, logs.

You will build 6 architecture steps that model production dependencies.

You will run 1 failure experiment to observe bottlenecks and recovery behavior.

Success target: Data flows from IoT → processing → storage with minimal lag. Both real-time and batch paths working.

Learning Objectives

  • Understand dual-path data pipelines (real-time + batch)
  • Know when to use stream processing vs batch
  • Learned about time-series databases for operational data
  • Experienced processing bottlenecks and parallelism solutions

Experiments

  1. Reduce Flink parallelism to 1 to create a processing bottleneck

Failure Modes to Trigger

  • Trigger: Reduce Flink parallelism to 1 to create a processing bottleneck

    Observe: Single-threaded Flink can't keep up with 5000 events/sec. Processing lag grows - real-time becomes minutes-delayed. Alerts fire late. Dashboards show stale data.