System Design Library

Discord / Live Chat

Realtime servers/channels with huge rooms and presence, low latency.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Channels & servers
Realtime messages
Presence
Huge rooms
History

Non-functional

Low latency
Massive fan-out per channel

Scale

Millions concurrent; large guilds

The approach

WS gateway tier holds connections; a pub/sub bus (per-channel topics) fans messages to subscribed gateways; messages persisted; presence aggregated. Per-guild routing.

Key components

Client ⇄ WS gateway ⇄ pub/sub (per channel) · message store · presence service

Numbers that matter

Discord's largest servers have ~800,000 members — a message in a busy channel must fan out to all connected members, potentially 100,000+ simultaneous WebSocket pushes for a popular gaming server.
Discord reportedly handles ~4 million concurrent WebSocket connections per gateway cluster, with each gateway process handling ~5,000–10,000 connections before CPU becomes the bottleneck.
Message delivery latency SLA is <250ms p99 from send to delivery — the pub/sub hop (Elixir/Erlang process mailboxes in Discord's case, or Kafka topics in a more generic design) must add <50ms to that budget.
Discord switched from MongoDB to Cassandra for message storage, eventually settling on ScyllaDB — achieving ~1ms p99 read latency for recent messages with ~100TB of message data per cluster.

Senior deep-dive

The WebSocket gateway tier is stateful — each connection is pinned to a gateway node, so routing messages to the right gateway is the core architectural challenge.

Per-guild routing (all members of a server share a pub/sub topic per channel) keeps fan-out bounded; without it, a 500,000-member server's channel message would require 500,000 individual pushes. Presence aggregation at scale is a distinct harder problem than message delivery — 'X users online' for a large server requires a separate presence subsystem, not real-time counting of WebSocket connections.

Gateway statefulness: the routing problem

Each WebSocket connection is held by exactly one gateway node — to deliver a message to user U, you must know which gateway node holds U's connection. Discord solves this with a session registry (a fast KV store mapping user_id → gateway_node_id) updated on connect/disconnect. On message publish, the message service looks up which gateways have connected members and pushes to only those nodes — not to all gateways. Consistency of this registry during reconnects and failover is a real problem: use short TTLs and accept brief delivery gaps rather than strong consistency.

Per-channel pub/sub: the fan-out unit

Each channel is a pub/sub topic. Gateway nodes subscribe to topics for the channels their connected users are members of. A message publish hits the broker once; the broker fans it to all subscribed gateways; each gateway pushes to its local connected users in that channel. This keeps the message service stateless — it writes to persistence and publishes to the broker, nothing more. The broker (Redis Pub/Sub, Kafka, or internal Erlang process groups) is the fan-out engine.

Large server fan-out: the 800k-member problem

For a server with 800k members and 50k concurrently connected, a channel message must reach 50k WebSocket connections across potentially hundreds of gateway nodes. Naive pub/sub still works — the broker fans to ~100 gateways, each pushes to their ~500 local users. The real problem is subscription management: when a user joins a 500-channel server, the gateway must subscribe to 500 topics. With 10k users per gateway that's 5 million topic subscriptions per node — use server-level topics (one topic per guild, messages tagged with channel_id) and filter at the gateway, reducing subscriptions to one per guild.

Message storage: partition by channel + time bucket

Partitioning ScyllaDB (or Cassandra) by channel_id alone creates hot partitions for active channels. Add a time bucket (e.g. weekly epoch) to the partition key — (channel_id, bucket) — so each partition holds at most one week of messages. Recent reads hit the latest bucket (hot, in-memory cache); older reads hit cold SSTables. Message IDs are Snowflake-style (timestamp embedded) for global ordering without coordination, and they double as the clustering key for efficient range scans.

Presence: approximate is fine, exact is expensive

Showing exact online count for a 500k-member server requires counting live WebSocket connections across hundreds of gateway nodes — that's a distributed count query on every display refresh. Instead, aggregate presence lazily: gateway nodes periodically push their online-user counts per guild to a presence aggregator; the aggregator publishes rolled-up counts every ~5 seconds. Members only see presence for friends or small servers — large servers show approximate counts with a staleness tolerance of ~30 seconds.

What breaks at scale

The thundering herd on gateway restart: when a gateway node crashes, all its connections (5k–10k users) reconnect simultaneously to other nodes. Each reconnect triggers presence updates, channel subscription setup, and missed-message fetch — a cascade that can overload the remaining gateways and the session registry. Mitigate with jittered reconnect backoff on the client (exponential + random delay), session resumption (clients send a resume token so the gateway can restore state without full handshake), and pre-warming replacement gateway nodes before taking a node down.

In production

Discord built their gateway layer in Elixir (Erlang VM), chosen specifically for its lightweight process model that can handle 500k+ concurrent connections per node. Message persistence uses ScyllaDB partitioned by (channel_id, bucket) where bucket is a time-based integer — this keeps hot channels' recent messages on a small number of SSTables for fast reads. The real engineering challenge Discord hit was 'read receipts and typing indicators at scale' — these are high-frequency ephemeral signals that if naively persisted create write storms. Discord handles them as pure pub/sub with no persistence (fire-and-forget) with client-side state only. The 'large server problem' (Fortnite, etc.) required building a separate fan-out subsystem where messages are delivered through an intermediate layer rather than directly from the message broker to gateways.

Common mistakes

Per-user fan-out instead of per-channel pub/sub
Storing presence in a DB (too chatty)
No sharding of giant guilds

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →