MediumChat App · Part 1

Chat App 1 - Team Messaging

WebSocketsDatabasesAPI DesignAuth

Problem Statement

ThreadSpace is building a team messaging application for small-to-medium businesses. Think Slack, but focused on simplicity. Core features:

- Channels - team-wide chat rooms (public and private).Direct messages - 1-on-1 and group DMs.Presence - show who's online, idle, or offline, updated within 10 seconds.Message history - full searchable history with infinite scroll.File sharing - attach images and documents (up to 25 MB) to messages.Typing indicators - show "Alice is typing…" in real time.

ThreadSpace targets companies with 10–500 employees. They expect 5,000 organizations using the platform with an average of 50 users each.

What You'll Learn

Build a Slack-like team messaging app with channels, DMs, and presence indicators. Build this architecture under realistic production constraints, then validate tradeoffs in the design lab simulation.

WebSocketsDatabasesAPI DesignAuth

Constraints

Total users~250,000
Concurrent online users~75,000
Messages per day~5,000,000
File upload limit25 MB
Message delivery latency< 500 ms
Presence update latency< 10 seconds
Availability target99.9%
ApproachClick to expand

Interview-Ready Approach

1) Clarify Scope and SLOs

  • Problem statement: Build a Slack-like team messaging app with channels, DMs, and presence indicators.
  • Design for a peak load target around 11,250 RPS (including burst headroom).
  • Total users: ~250,000
  • Concurrent online users: ~75,000
  • Messages per day: ~5,000,000
  • File upload limit: 25 MB
  • Message delivery latency: < 500 ms

2) Capacity Planning Method

  • Convert traffic and growth constraints into request rate, storage growth, and concurrency budgets.
  • Keep at least 2-3x safety margin per tier (ingress, compute, storage, async workers).
  • Reserve explicit latency budgets per hop so p95 can be defended in review.

3) Architecture Decisions

  • WebSockets: Use persistent connection gateways and decouple fanout via pub/sub or queues.
  • Databases: Define a clear system-of-record and design read/write paths separately before adding optimizations.
  • API Design: Standardize API boundaries, idempotency keys, pagination, and error contracts first.
  • Auth: Centralize identity verification and keep authorization checks close to domain resources.

4) Reliability and Failure Strategy

  • Track connection churn, backpressure, and session resumption behavior.
  • Use strong write constraints (transactions or conditional writes) and explicit backup/restore strategy.
  • Apply strict input validation and backward-compatible versioning.
  • Use short-lived tokens and secure key rotation workflows.

5) Validation Plan

  • Run one peak-load test, one dependency-degradation test, and one failover test.
  • Verify idempotency for all retried writes and async consumers.
  • Track user-facing SLOs first: p95 latency, error rate, and successful throughput.

6) Trade-offs to Call Out in Interviews

  • WebSockets: WebSockets reduce interaction latency but complicate scaling and state management.
  • Databases: SQL gives stronger transactional guarantees; NoSQL often gives better write scaling and flexibility.
  • API Design: Rich APIs improve developer speed but can create long-term compatibility burden.
  • Auth: Central auth simplifies policy, but makes auth service availability/security critical.

Practical Notes

  • WebSocket connections per user, with a connection manager service to track who's connected where.
  • Store messages in a database partitioned by channel_id for efficient history queries.
  • Presence can be tracked via periodic heartbeats from the client, stored in Redis with TTL.

Learn the Concept

Practice Next

Reference SolutionClick to reveal

Why This Solution Works

Request path: The solution keeps ingress, service logic, and stateful dependencies separated so each layer can scale independently.

Reference flow: Mobile Clients -> Load Balancer -> API Gateway -> API Service -> Auth Service -> Primary NoSQL DB -> Realtime Bus

Design strengths

  • Security controls are enforced at ingress to protect downstream capacity.

Interview defense

  • This design makes bottlenecks explicit (ingress, core compute, persistence, async workers).
  • It supports progressive scaling without re-architecting the core request path.
  • It keeps correctness-sensitive state changes in durable systems while offloading background work asynchronously.