Blog
Real-Time Chat System Design: From WebSockets to Message Delivery
April 7, 2026 · Updated April 7, 2026 · 10 min read
How to design a chat system that handles millions of concurrent connections, guarantees message ordering, and supports offline delivery.
Definition
A real-time chat system delivers messages between users with low latency using persistent connections (WebSockets), message storage for offline delivery, and ordering guarantees within conversations.
Implementation Checklist
- Use WebSockets for real-time delivery. Fall back to long polling for environments where WebSockets are blocked (corporate firewalls, older proxies).
- Assign monotonically increasing message IDs per conversation for ordering. Do not rely on timestamps since clock skew across servers causes ordering bugs.
- Store messages in a write-optimized database partitioned by conversation ID. Each conversation is a natural partition key with strong locality.
- Implement at-least-once delivery with client-side deduplication. Exactly-once delivery across network boundaries is practically impossible; accept duplicates and deduplicate on the client.
Connection Management Is the Hard Part
The core challenge in chat system design is not message delivery. It is managing millions of persistent connections across a fleet of servers while handling disconnections, reconnections, and server failures gracefully.
Implement heartbeat pings (every 30s) to detect dead connections. Use connection draining during server deployments to migrate connections without dropping messages. Track connection-to-server mapping in a fast lookup store (Redis).
Offline Delivery and Read Receipts
A chat system that only works when both users are online is incomplete. Store undelivered messages and push them when the recipient comes online. Use push notifications (APNs, FCM) to alert offline users.
Read receipts require tracking per-user, per-conversation read cursors. Store the last-read message ID per user per conversation. Broadcast read receipt updates to other participants via the same WebSocket channel.
Tradeoff Table
| Decision | Speed-First Option | Reliability-First Option | Recommended When |
|---|---|---|---|
| WebSocket vs Server-Sent Events vs Long Polling | WebSockets are bidirectional and lowest latency | SSE is simpler (one-direction) and auto-reconnects; long polling works everywhere | WebSockets for chat (bidirectional). SSE for notifications (server-to-client). Long polling as last resort |
| Fan-out on Write vs Fan-out on Read | Fan-out on write pre-delivers to all recipients, instant reads | Fan-out on read computes the inbox at read time, less storage but higher read latency | Fan-out on write for 1:1 and small group chats. Fan-out on read for large channels with thousands of members |
| Persistent Connections vs Connectionless | Persistent connections deliver instantly but require connection state management | Connectionless (polling) is stateless and simpler to scale but higher latency | Persistent for active users. Connectionless (push notifications) for offline or backgrounded users |
Practice Next
WebSockets Topic Hub
Real-time communication patterns, WebSocket scaling, and messaging architecture.
Chat System Case Study Lab
Practice designing a chat system architecture in the guided interactive lab.
Challenges
- Chat App 1 - MVP
Design a chat application MVP with real-time messaging and basic group support.
- Chat App 2 - Scale
Scale the chat application to millions of concurrent users with geo-distribution and encryption.
Newsletter CTA
Join the SystemForces newsletter for practical architecture and distributed systems notes.
Get weekly system design breakdownsFrequently Asked Questions
How many WebSocket connections can a single server handle?
A well-tuned server can handle 100k to 500k concurrent WebSocket connections depending on message rate and payload size. The bottleneck is usually memory (per-connection buffers) and event loop capacity, not CPU.
How do I route messages when the recipient is connected to a different server?
Use a pub/sub system (Redis Pub/Sub, Kafka) as a message bus between WebSocket servers. Each server subscribes to channels for its connected users. When a message arrives, publish to the recipient's channel; the server holding that connection delivers it.
Should I use a message queue for chat?
Not for real-time delivery. Message queues add latency. Use them for offline message storage and delivery retry. For online delivery, use direct WebSocket push via pub/sub. Queue the message only if the recipient is offline.