System Design Library

WhatsApp / Messenger

Realtime 1:1 and group chat with ordering, delivery receipts and offline delivery.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Send/receive realtime
  • Groups
  • Delivery/read receipts
  • Offline delivery
  • Media

Non-functional

  • Low latency
  • Ordered, exactly-once feel
  • Millions of live connections

Scale

Billions of messages/day

The approach

Persistent WebSocket connections to a horizontally-scaled gateway tier; a per-recipient queue/inbox ensures ordering and offline storage; messages persisted; presence in cache.

Key components

Client ⇄ LB ⇄ WS gateway · message queue/inbox · history store (NoSQL) · presence cache

Numbers that matter

Senior deep-dive

Connections are stateful — that is the whole problem. Unlike HTTP, each user holds a live socket on one specific gateway.

You need a routing layer mapping user → which gateway holds their socket (a presence/registry) so a message for them can be delivered.

Ordering uses monotonic per-chat IDs; "exactly-once" = at-least-once delivery + idempotent client dedup.

Stateful connections change everything

A chat server is not a stateless HTTP box — each user's socket lives on one specific gateway, and that gateway must be reachable to deliver their messages. Load balancers route the initial connect; a presence registry then tracks user → gateway so senders can find recipients. Lose that mapping and messages have nowhere to go.

The inbox makes offline delivery work

Never assume the recipient is online. Write every message to a durable per-recipient inbox, push it if they are connected, and replay on reconnect. This is also what gives you ordering and receipts — "delivered" when it lands in the inbox, "read" when the client acks.

Exactly-once is a client illusion

True exactly-once is impractical; deliver at-least-once and dedup on the client using each message's unique ID. Senders retry until acked; recipients drop duplicates. Pair this with monotonic per-chat IDs so the client can order messages and detect gaps.

Groups: fan out through the inbox

A group send is one write that fans out to every member's inbox, done by workers — not on the sender's request thread. Large groups must be async and rate-limited. Membership lives in its own store; the sender shouldn't block on a thousand deliveries.

Presence and media

Presence (online/typing) is high-churn and ephemeral — keep it in a cache with TTL, not the durable store. Media goes to object storage + CDN; the message carries a reference (and for end-to-end encryption the file is encrypted client-side, so the server only sees ciphertext).

What breaks at scale

The hard limits are concurrent sockets per gateway, presence-registry churn, and group fan-out amplification. Shard gateways and route by user, keep presence in memory, async + cap group delivery. Graceful reconnection (resume from last-acked message) is what makes it feel reliable on flaky mobile networks.

In production

WhatsApp famously served millions of connections per server on Erlang. The pattern is universal: a stateful WebSocket gateway tier, a presence registry to route to the right server, durable per-user inboxes for offline delivery, and client-side dedup for the "exactly-once" feel. Slack, Messenger, and Signal are variations on this.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →