Discord / Live Chat
Realtime servers/channels with huge rooms and presence, low latency.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Channels & servers
- Realtime messages
- Presence
- Huge rooms
- History
Non-functional
- Low latency
- Massive fan-out per channel
Scale
Millions concurrent; large guilds
The approach
WS gateway tier holds connections; a pub/sub bus (per-channel topics) fans messages to subscribed gateways; messages persisted; presence aggregated. Per-guild routing.
Key components
Client ⇄ WS gateway ⇄ pub/sub (per channel) · message store · presence service
Numbers that matter
- Discord's largest servers have ~800,000 members — a message in a busy channel must fan out to all connected members, potentially 100,000+ simultaneous WebSocket pushes for a popular gaming server.
- Discord reportedly handles ~4 million concurrent WebSocket connections per gateway cluster, with each gateway process handling ~5,000–10,000 connections before CPU becomes the bottleneck.
- Message delivery latency SLA is <250ms p99 from send to delivery — the pub/sub hop (Elixir/Erlang process mailboxes in Discord's case, or Kafka topics in a more generic design) must add <50ms to that budget.
- Discord switched from MongoDB to Cassandra for message storage, eventually settling on ScyllaDB — achieving ~1ms p99 read latency for recent messages with ~100TB of message data per cluster.
Senior deep-dive
The WebSocket gateway tier is stateful — each connection is pinned to a gateway node, so routing messages to the right gateway is the core architectural challenge.
Per-guild routing (all members of a server share a pub/sub topic per channel) keeps fan-out bounded; without it, a 500,000-member server's channel message would require 500,000 individual pushes. Presence aggregation at scale is a distinct harder problem than message delivery — 'X users online' for a large server requires a separate presence subsystem, not real-time counting of WebSocket connections.
Gateway statefulness: the routing problem
Each WebSocket connection is held by exactly one gateway node — to deliver a message to user U, you must know which gateway node holds U's connection. Discord solves this with a session registry (a fast KV store mapping user_id → gateway_node_id) updated on connect/disconnect. On message publish, the message service looks up which gateways have connected members and pushes to only those nodes — not to all gateways. Consistency of this registry during reconnects and failover is a real problem: use short TTLs and accept brief delivery gaps rather than strong consistency.
Per-channel pub/sub: the fan-out unit
Each channel is a pub/sub topic. Gateway nodes subscribe to topics for the channels their connected users are members of. A message publish hits the broker once; the broker fans it to all subscribed gateways; each gateway pushes to its local connected users in that channel. This keeps the message service stateless — it writes to persistence and publishes to the broker, nothing more. The broker (Redis Pub/Sub, Kafka, or internal Erlang process groups) is the fan-out engine.
Large server fan-out: the 800k-member problem
For a server with 800k members and 50k concurrently connected, a channel message must reach 50k WebSocket connections across potentially hundreds of gateway nodes. Naive pub/sub still works — the broker fans to ~100 gateways, each pushes to their ~500 local users. The real problem is subscription management: when a user joins a 500-channel server, the gateway must subscribe to 500 topics. With 10k users per gateway that's 5 million topic subscriptions per node — use server-level topics (one topic per guild, messages tagged with channel_id) and filter at the gateway, reducing subscriptions to one per guild.
Message storage: partition by channel + time bucket
Partitioning ScyllaDB (or Cassandra) by channel_id alone creates hot partitions for active channels. Add a time bucket (e.g. weekly epoch) to the partition key — (channel_id, bucket) — so each partition holds at most one week of messages. Recent reads hit the latest bucket (hot, in-memory cache); older reads hit cold SSTables. Message IDs are Snowflake-style (timestamp embedded) for global ordering without coordination, and they double as the clustering key for efficient range scans.
Presence: approximate is fine, exact is expensive
Showing exact online count for a 500k-member server requires counting live WebSocket connections across hundreds of gateway nodes — that's a distributed count query on every display refresh. Instead, aggregate presence lazily: gateway nodes periodically push their online-user counts per guild to a presence aggregator; the aggregator publishes rolled-up counts every ~5 seconds. Members only see presence for friends or small servers — large servers show approximate counts with a staleness tolerance of ~30 seconds.
What breaks at scale
The thundering herd on gateway restart: when a gateway node crashes, all its connections (5k–10k users) reconnect simultaneously to other nodes. Each reconnect triggers presence updates, channel subscription setup, and missed-message fetch — a cascade that can overload the remaining gateways and the session registry. Mitigate with jittered reconnect backoff on the client (exponential + random delay), session resumption (clients send a resume token so the gateway can restore state without full handshake), and pre-warming replacement gateway nodes before taking a node down.
In production
Discord built their gateway layer in Elixir (Erlang VM), chosen specifically for its lightweight process model that can handle 500k+ concurrent connections per node. Message persistence uses ScyllaDB partitioned by (channel_id, bucket) where bucket is a time-based integer — this keeps hot channels' recent messages on a small number of SSTables for fast reads. The real engineering challenge Discord hit was 'read receipts and typing indicators at scale' — these are high-frequency ephemeral signals that if naively persisted create write storms. Discord handles them as pure pub/sub with no persistence (fire-and-forget) with client-side state only. The 'large server problem' (Fortnite, etc.) required building a separate fan-out subsystem where messages are delivered through an intermediate layer rather than directly from the message broker to gateways.
Common mistakes
- Per-user fan-out instead of per-channel pub/sub
- Storing presence in a DB (too chatty)
- No sharding of giant guilds