System Design Library

Discord / Live Chat

Realtime servers/channels with huge rooms and presence, low latency.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Channels & servers
  • Realtime messages
  • Presence
  • Huge rooms
  • History

Non-functional

  • Low latency
  • Massive fan-out per channel

Scale

Millions concurrent; large guilds

The approach

WS gateway tier holds connections; a pub/sub bus (per-channel topics) fans messages to subscribed gateways; messages persisted; presence aggregated. Per-guild routing.

Key components

Client ⇄ WS gateway ⇄ pub/sub (per channel) · message store · presence service

Numbers that matter

Senior deep-dive

The WebSocket gateway tier is stateful — each connection is pinned to a gateway node, so routing messages to the right gateway is the core architectural challenge.

Per-guild routing (all members of a server share a pub/sub topic per channel) keeps fan-out bounded; without it, a 500,000-member server's channel message would require 500,000 individual pushes. Presence aggregation at scale is a distinct harder problem than message delivery — 'X users online' for a large server requires a separate presence subsystem, not real-time counting of WebSocket connections.

Gateway statefulness: the routing problem

Each WebSocket connection is held by exactly one gateway node — to deliver a message to user U, you must know which gateway node holds U's connection. Discord solves this with a session registry (a fast KV store mapping user_id → gateway_node_id) updated on connect/disconnect. On message publish, the message service looks up which gateways have connected members and pushes to only those nodes — not to all gateways. Consistency of this registry during reconnects and failover is a real problem: use short TTLs and accept brief delivery gaps rather than strong consistency.

Per-channel pub/sub: the fan-out unit

Each channel is a pub/sub topic. Gateway nodes subscribe to topics for the channels their connected users are members of. A message publish hits the broker once; the broker fans it to all subscribed gateways; each gateway pushes to its local connected users in that channel. This keeps the message service stateless — it writes to persistence and publishes to the broker, nothing more. The broker (Redis Pub/Sub, Kafka, or internal Erlang process groups) is the fan-out engine.

Large server fan-out: the 800k-member problem

For a server with 800k members and 50k concurrently connected, a channel message must reach 50k WebSocket connections across potentially hundreds of gateway nodes. Naive pub/sub still works — the broker fans to ~100 gateways, each pushes to their ~500 local users. The real problem is subscription management: when a user joins a 500-channel server, the gateway must subscribe to 500 topics. With 10k users per gateway that's 5 million topic subscriptions per node — use server-level topics (one topic per guild, messages tagged with channel_id) and filter at the gateway, reducing subscriptions to one per guild.

Message storage: partition by channel + time bucket

Partitioning ScyllaDB (or Cassandra) by channel_id alone creates hot partitions for active channels. Add a time bucket (e.g. weekly epoch) to the partition key — (channel_id, bucket) — so each partition holds at most one week of messages. Recent reads hit the latest bucket (hot, in-memory cache); older reads hit cold SSTables. Message IDs are Snowflake-style (timestamp embedded) for global ordering without coordination, and they double as the clustering key for efficient range scans.

Presence: approximate is fine, exact is expensive

Showing exact online count for a 500k-member server requires counting live WebSocket connections across hundreds of gateway nodes — that's a distributed count query on every display refresh. Instead, aggregate presence lazily: gateway nodes periodically push their online-user counts per guild to a presence aggregator; the aggregator publishes rolled-up counts every ~5 seconds. Members only see presence for friends or small servers — large servers show approximate counts with a staleness tolerance of ~30 seconds.

What breaks at scale

The thundering herd on gateway restart: when a gateway node crashes, all its connections (5k–10k users) reconnect simultaneously to other nodes. Each reconnect triggers presence updates, channel subscription setup, and missed-message fetch — a cascade that can overload the remaining gateways and the session registry. Mitigate with jittered reconnect backoff on the client (exponential + random delay), session resumption (clients send a resume token so the gateway can restore state without full handshake), and pre-warming replacement gateway nodes before taking a node down.

In production

Discord built their gateway layer in Elixir (Erlang VM), chosen specifically for its lightweight process model that can handle 500k+ concurrent connections per node. Message persistence uses ScyllaDB partitioned by (channel_id, bucket) where bucket is a time-based integer — this keeps hot channels' recent messages on a small number of SSTables for fast reads. The real engineering challenge Discord hit was 'read receipts and typing indicators at scale' — these are high-frequency ephemeral signals that if naively persisted create write storms. Discord handles them as pure pub/sub with no persistence (fire-and-forget) with client-side state only. The 'large server problem' (Fortnite, etc.) required building a separate fan-out subsystem where messages are delivered through an intermediate layer rather than directly from the message broker to gateways.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →