Blog

Real-Time Chat System Design: From WebSockets to Message Delivery

April 7, 2026 · Updated April 7, 2026 · 10 min read

How to design a chat system that handles millions of concurrent connections, guarantees message ordering, and supports offline delivery.

Definition

A real-time chat system delivers messages between users with low latency using persistent connections (WebSockets), message storage for offline delivery, and ordering guarantees within conversations.

Implementation Checklist

Use WebSockets for real-time delivery. Fall back to long polling for environments where WebSockets are blocked (corporate firewalls, older proxies).
Assign monotonically increasing message IDs per conversation for ordering. Do not rely on timestamps since clock skew across servers causes ordering bugs.
Store messages in a write-optimized database partitioned by conversation ID. Each conversation is a natural partition key with strong locality.
Implement at-least-once delivery with client-side deduplication. Exactly-once delivery across network boundaries is practically impossible; accept duplicates and deduplicate on the client.

Connection Management Is the Hard Part

The core challenge in chat system design is not message delivery. It is managing millions of persistent connections across a fleet of servers while handling disconnections, reconnections, and server failures gracefully.

Implement heartbeat pings (every 30s) to detect dead connections. Use connection draining during server deployments to migrate connections without dropping messages. Track connection-to-server mapping in a fast lookup store (Redis).

Offline Delivery and Read Receipts

A chat system that only works when both users are online is incomplete. Store undelivered messages and push them when the recipient comes online. Use push notifications (APNs, FCM) to alert offline users.

Read receipts require tracking per-user, per-conversation read cursors. Store the last-read message ID per user per conversation. Broadcast read receipt updates to other participants via the same WebSocket channel.

Tradeoff Table

Decision	Speed-First Option	Reliability-First Option	Recommended When
WebSocket vs Server-Sent Events vs Long Polling	WebSockets are bidirectional and lowest latency	SSE is simpler (one-direction) and auto-reconnects; long polling works everywhere	WebSockets for chat (bidirectional). SSE for notifications (server-to-client). Long polling as last resort
Fan-out on Write vs Fan-out on Read	Fan-out on write pre-delivers to all recipients, instant reads	Fan-out on read computes the inbox at read time, less storage but higher read latency	Fan-out on write for 1:1 and small group chats. Fan-out on read for large channels with thousands of members
Persistent Connections vs Connectionless	Persistent connections deliver instantly but require connection state management	Connectionless (polling) is stateless and simpler to scale but higher latency	Persistent for active users. Connectionless (push notifications) for offline or backgrounded users

Practice Next

WebSockets Topic Hub

Real-time communication patterns, WebSocket scaling, and messaging architecture.

Chat System Case Study Lab

Practice designing a chat system architecture in the guided interactive lab.

Challenges

Chat App 1 - MVP
Design a chat application MVP with real-time messaging and basic group support.
Chat App 2 - Scale
Scale the chat application to millions of concurrent users with geo-distribution and encryption.

Frequently Asked Questions

How many WebSocket connections can a single server handle?

A well-tuned server can handle 100k to 500k concurrent WebSocket connections depending on message rate and payload size. The bottleneck is usually memory (per-connection buffers) and event loop capacity, not CPU.

How do I route messages when the recipient is connected to a different server?

Use a pub/sub system (Redis Pub/Sub, Kafka) as a message bus between WebSocket servers. Each server subscribes to channels for its connected users. When a message arrives, publish to the recipient's channel; the server holding that connection delivers it.

Should I use a message queue for chat?

Not for real-time delivery. Message queues add latency. Use them for offline message storage and delivery retry. For online delivery, use direct WebSocket push via pub/sub. Queue the message only if the recipient is offline.

Implementation Checklist

Use WebSockets for real-time delivery. Fall back to long polling for environments where WebSockets are blocked (corporate firewalls, older proxies).

Assign monotonically increasing message IDs per conversation for ordering. Do not rely on timestamps since clock skew across servers causes ordering bugs.

Store messages in a write-optimized database partitioned by conversation ID. Each conversation is a natural partition key with strong locality.

Implement at-least-once delivery with client-side deduplication. Exactly-once delivery across network boundaries is practically impossible; accept duplicates and deduplicate on the client.

Connection Management Is the Hard Part

Offline Delivery and Read Receipts

Tradeoff Table

Decision	Speed-First Option	Reliability-First Option	Recommended When
WebSocket vs Server-Sent Events vs Long Polling	WebSockets are bidirectional and lowest latency	SSE is simpler (one-direction) and auto-reconnects; long polling works everywhere	WebSockets for chat (bidirectional). SSE for notifications (server-to-client). Long polling as last resort
Fan-out on Write vs Fan-out on Read	Fan-out on write pre-delivers to all recipients, instant reads	Fan-out on read computes the inbox at read time, less storage but higher read latency	Fan-out on write for 1:1 and small group chats. Fan-out on read for large channels with thousands of members
Persistent Connections vs Connectionless	Persistent connections deliver instantly but require connection state management	Connectionless (polling) is stateless and simpler to scale but higher latency	Persistent for active users. Connectionless (push notifications) for offline or backgrounded users

Practice Next

WebSockets Topic Hub

Real-time communication patterns, WebSocket scaling, and messaging architecture.

Chat System Case Study Lab

Practice designing a chat system architecture in the guided interactive lab.

Challenges

Chat App 1 - MVP
Design a chat application MVP with real-time messaging and basic group support.
Chat App 2 - Scale
Scale the chat application to millions of concurrent users with geo-distribution and encryption.

Frequently Asked Questions