Step 1: Clarify Requirements
Functional Requirements
- One-on-one messaging between two users.
- Group chat with up to 500 members.
- Online/offline presence indicators.
- Message history and persistence: messages are stored permanently.
- Multi-device support: a user can be logged in on phone and laptop simultaneously.
- Read receipts and delivery status (sent, delivered, read).
- Push notifications for offline users.
Non-Functional Requirements
- Low latency: messages delivered in under 200ms between online users.
- Message ordering must be preserved within a conversation.
- At-least-once delivery: no messages are lost.
- High availability: the chat service should be always reachable.
- Support 50 million daily active users (DAU).
Step 2: Back-of-Envelope Estimates
DAU: 50 million
Average messages/user/day: 40
Total messages/day: 50M * 40 = 2 billion
Messages/second: 2B / 86,400 = ~23,000 msg/sec
Message size (average): ~200 bytes (text + metadata)
Daily storage: 2B * 200B = 400 GB / day
5-year storage: 400 GB * 365 * 5 = ~730 TB
Concurrent connections (peak):
~10% of DAU online at peak = 5 million WebSocket connectionsStep 3: High-Level Design
Step 4: Deep Dive
Connection Protocol: WebSockets
HTTP is client-initiated: the server cannot push messages to the client without a request. For real-time chat, we need a persistent, bidirectional connection. WebSocket is the standard choice:
- Full-duplex communication over a single TCP connection.
- Client opens a WebSocket once and keeps it alive with heartbeats.
- Server pushes messages to the client instantly without polling.
- Each chat server maintains tens of thousands of concurrent WebSocket connections.
Message Flow: One-on-One
Message Flow: Group Chat
For a group with N members, the sender's message must be delivered to N-1 recipients:
- Small groups (under 500): Publish the message to a per-group Kafka topic or fan-out to each member through the routing service.
- Fan-out on write: When a message arrives, immediately create a copy in each recipient's inbox queue. Fast reads, expensive writes.
- Fan-out on read: Store the message once in the group's message log. Each client reads messages it has not yet seen. Cheap writes, potentially slower reads.
- Most systems use fan-out on write for small groups and fan-out on read for channels/large groups.
Message Storage
Chat message storage has a write-heavy, append-only, time-ordered access pattern. This makes it ideal for wide-column stores:
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID, -- time-ordered unique ID
sender_id UUID,
content TEXT,
content_type TEXT, -- 'text', 'image', 'file'
created_at TIMESTAMP,
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
-- Partition key: conversation_id (all msgs in a conversation on one node)
-- Clustering key: message_id (sorted by time within partition)Relational databases can work for smaller-scale chat, but at billions of messages per day, a wide-column store like Cassandra or HBase provides better write throughput, linear horizontal scaling, and efficient range queries by time within a conversation partition.
Message ID and Ordering
Messages must be ordered within each conversation. Options for generating ordered IDs:
- TIMEUUID / Snowflake ID: Encodes timestamp in the ID. Naturally time-ordered. Used by Cassandra, Discord, Twitter.
- Lamport timestamps: Logical clocks that ensure causal ordering even across servers.
- Per-conversation sequence number: Incrementing counter per conversation. Simple, but requires coordination.
Presence Service
Tracking who is online:
- When a user connects via WebSocket, mark them online in a Redis hash:
presence:{user_id} -> {server_id, last_seen}. - Use heartbeats (every 30 seconds) to keep the entry alive. If no heartbeat, mark offline.
- For a user's contact list, batch-fetch presence status for all contacts from Redis.
- Publish presence changes to a Pub/Sub channel so friends are notified.
For 50M DAU, presence fan-out is expensive. If a user has 500 friends, going online triggers 500 notifications. Optimization: only send presence updates for users with the chat window actively open, or batch presence updates.
Multi-Device Sync
- Each device maintains its own WebSocket connection, potentially to different chat servers.
- The session store maps
user_id -> [server1:device_phone, server2:device_laptop]. - Messages are routed to all active devices for the recipient.
- Each device tracks a
last_seen_message_idto sync message history on reconnect.
Step 5: Scaling & Reliability
- WebSocket server scaling: Each server handles ~50K-100K connections. For 5M concurrent, deploy 50-100 chat servers behind a load balancer with sticky sessions (by user_id).
- Message queue partitioning: Partition Kafka topics by conversation_id to preserve ordering within a conversation.
- Storage sharding: Cassandra partitions by conversation_id automatically. Hot conversations (celebrity group chats) may need further splitting.
- Offline message sync: When a user comes online, the client sends its last_seen_message_id. The server returns all messages after that ID.
- End-to-end encryption: Messages are encrypted on the sender's device and decrypted on the recipient's device. The server cannot read message content. Uses the Signal Protocol (Double Ratchet algorithm).
Key Takeaways
- WebSocket provides the persistent, bidirectional connection needed for real-time messaging.
- Decouple message persistence and delivery using a message queue (Kafka) for reliability.
- Use a wide-column store (Cassandra) for write-heavy, time-ordered message storage.
- A session store (Redis) maps each user to their active chat server for message routing.
- Presence tracking and group message fan-out are the main scaling challenges: optimize with batching and lazy updates.