WhatsApp / Messenger
Realtime 1:1 and group chat with ordering, delivery receipts and offline delivery.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Send/receive realtime
- Groups
- Delivery/read receipts
- Offline delivery
- Media
Non-functional
- Low latency
- Ordered, exactly-once feel
- Millions of live connections
Scale
Billions of messages/day
The approach
Persistent WebSocket connections to a horizontally-scaled gateway tier; a per-recipient queue/inbox ensures ordering and offline storage; messages persisted; presence in cache.
Key components
Client ⇄ LB ⇄ WS gateway · message queue/inbox · history store (NoSQL) · presence cache
Numbers that matter
- Millions of concurrent live connections — the gateway tier is sized by open sockets, not requests/sec; every idle connection still costs memory and a heartbeat.
- A message for an offline user must persist — write to a per-recipient inbox/queue and deliver on reconnect; "delivered" and "read" are two separate receipts.
- Group fan-out is the trap — a 1000-member group means one send → up to 1000 deliveries; do it async via the inbox, never on the sender's request thread.
- Ordering is per-chat, not global — a monotonic sortable ID per conversation is enough; global ordering across all chats is unnecessary and unscalable.
Senior deep-dive
Connections are stateful — that is the whole problem. Unlike HTTP, each user holds a live socket on one specific gateway.
You need a routing layer mapping user → which gateway holds their socket (a presence/registry) so a message for them can be delivered.
Ordering uses monotonic per-chat IDs; "exactly-once" = at-least-once delivery + idempotent client dedup.
Stateful connections change everything
A chat server is not a stateless HTTP box — each user's socket lives on one specific gateway, and that gateway must be reachable to deliver their messages. Load balancers route the initial connect; a presence registry then tracks user → gateway so senders can find recipients. Lose that mapping and messages have nowhere to go.
The inbox makes offline delivery work
Never assume the recipient is online. Write every message to a durable per-recipient inbox, push it if they are connected, and replay on reconnect. This is also what gives you ordering and receipts — "delivered" when it lands in the inbox, "read" when the client acks.
Exactly-once is a client illusion
True exactly-once is impractical; deliver at-least-once and dedup on the client using each message's unique ID. Senders retry until acked; recipients drop duplicates. Pair this with monotonic per-chat IDs so the client can order messages and detect gaps.
Groups: fan out through the inbox
A group send is one write that fans out to every member's inbox, done by workers — not on the sender's request thread. Large groups must be async and rate-limited. Membership lives in its own store; the sender shouldn't block on a thousand deliveries.
Presence and media
Presence (online/typing) is high-churn and ephemeral — keep it in a cache with TTL, not the durable store. Media goes to object storage + CDN; the message carries a reference (and for end-to-end encryption the file is encrypted client-side, so the server only sees ciphertext).
What breaks at scale
The hard limits are concurrent sockets per gateway, presence-registry churn, and group fan-out amplification. Shard gateways and route by user, keep presence in memory, async + cap group delivery. Graceful reconnection (resume from last-acked message) is what makes it feel reliable on flaky mobile networks.
In production
WhatsApp famously served millions of connections per server on Erlang. The pattern is universal: a stateful WebSocket gateway tier, a presence registry to route to the right server, durable per-user inboxes for offline delivery, and client-side dedup for the "exactly-once" feel. Slack, Messenger, and Signal are variations on this.
Common mistakes
- Treating WS servers as stateless HTTP
- No offline inbox (lost messages)
- Fan-out groups synchronously to thousands