EasyIoT Platform · Part 1

IoT Platform 1 - Smart Home Hub

DatabasesAPI DesignWebSocketsAuth

Problem Statement

HomeLink is building a smart home platform that lets users control lights, thermostats, cameras, and locks from a single mobile app. Core requirements:

- Device registry - register and manage devices (add, remove, rename, group by room).Real-time control - users tap a button and the device responds within 1 second (e.g., turn on a light).Status updates - devices push their state (temperature, on/off, battery level) every 30 seconds. The app shows live status.Scenes & automation - users create rules like "When I leave home, turn off all lights and lock the door."Multi-user households - a home has multiple users with different permission levels (owner, adult, child).

HomeLink targets 50,000 households with an average of 12 devices each.

What You'll Learn

Build a smart home platform connecting 100 k devices with real-time control and status updates. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

DatabasesAPI DesignWebSocketsAuth

Constraints

Total devices~600,000
Concurrent online devices~100,000
Command latency< 1 second
Status update frequencyEvery 30 seconds
Households~50,000
Availability target99.9%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Build a smart home platform connecting 100 k devices with real-time control and status updates.
  • Design for a peak load target around 15,000 RPS (including burst headroom).
  • Total devices: ~600,000
  • Concurrent online devices: ~100,000
  • Command latency: < 1 second
  • Status update frequency: Every 30 seconds
  • Households: ~50,000

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • API Design: Standardize API boundaries, idempotency keys, pagination, and error contracts first.
  • WebSockets: Use persistent connection gateways and decouple fanout via pub/sub or queues.
  • Auth: Centralize identity verification and keep authorization checks close to domain resources.

4) Reliability and Failure Strategy

  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Apply strict input validation and backward-compatible versioning.
  • Track connection churn, backpressure, and session resumption behavior.
  • Use short-lived tokens and secure key rotation workflows.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • API Design: Rich APIs improve developer speed but can create long-term compatibility burden.
  • WebSockets: WebSockets reduce interaction latency but complicate scaling and state management.
  • Auth: Central auth simplifies policy, but makes auth service availability/security critical.

Practical Notes

  • MQTT is the standard protocol for IoT - lightweight, pub/sub, supports QoS levels.
  • Group devices by household; a message broker topic per household keeps traffic isolated.
  • Device shadows (AWS IoT pattern) store the last-known state so the app always has data even if the device is offline.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: IoT Devices -> API Gateway -> API Service -> Auth Service -> Primary NoSQL DB -> Realtime Bus

Design strengths

  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.