System Design Library

Live Comments (Twitch/YouTube)

Stream live comments to millions of concurrent viewers of one stream.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Post comment
Broadcast to all viewers
Moderation
Replay

Non-functional

Low latency
Millions of subscribers on one stream

Scale

1 stream → millions of viewers

The approach

A pub/sub topic per stream; WS gateways subscribe and push to viewers; sampling/rate-limiting under extreme load (you can't show every comment); moderation pipeline filters.

Key components

Client ⇄ WS gateway ⇄ per-stream pub/sub · moderation · store for replay

Numbers that matter

A peak Twitch stream (e.g. a major esports event) can receive ~50,000 chat messages per minute — at 1M concurrent viewers, naively delivering all of them would require each viewer to receive ~833 messages/minute (~14 msg/sec), which is visually unusable anyway.
Twitch's IRC gateway reportedly handled ~2 million concurrent WebSocket connections during peak — with ~5,000 connections per gateway process, that requires ~400 gateway nodes minimum.
A single WebSocket message with a chat line is typically 200–500 bytes — at 500 msg/sec delivered to 10,000 viewers per gateway node, that's ~2.5 GB/s of egress per node — a hard hardware limit that forces sampling.
Sampling threshold on Twitch is reportedly kicked in at ~120 messages per channel per second — above this, clients receive a sampled subset and a 'chat is moving too fast' indicator.

Senior deep-dive

Fan-out at extreme scale requires accepting message loss — when 1 million viewers watch a single stream, you cannot guarantee every viewer sees every chat message; sampling is a product decision masquerading as an engineering one.

A pub/sub topic per stream is the right unit of isolation — it prevents one viral stream from degrading chat for others. The bottleneck is always the WebSocket gateway egress bandwidth, not the pub/sub infrastructure — plan for ~1KB/message × N messages/sec × M viewers per gateway node.

Sampling: the engineering–product boundary

When a channel hits 1M viewers and 500 msg/sec, full delivery is physically impossible without enormous egress cost. Sampling is mandatory — but the design question is: sample at the pub/sub broker (only publish 1 in 5 messages to the topic) or at the gateway (each gateway independently drops messages). Broker-side sampling ensures consistency (all viewers see the same subset); gateway-side sampling is simpler but means viewers see different messages. Most systems use broker-side sampling with a configurable rate per channel.

Per-stream topic isolation prevents noisy-neighbor blast radius

One pub/sub topic per channel (stream) means a viral stream creating 10k msg/sec only stresses the gateways subscribed to that channel. Without isolation, a single hot stream can exhaust the broker's partition and degrade all other channels. Twitch discovered this early and moved to strict per-channel topic isolation. Gateway nodes subscribe only to channels their connected viewers are watching — a viewer changing streams triggers an unsubscribe/subscribe, which must be fast (< 100ms) to avoid missed messages during channel switches.

Moderation pipeline must be on the critical path

Banned words, user bans, and AutoMod must filter messages before fan-out, not after — once a toxic message is published to 1M viewers, you cannot un-deliver it. This means the ingestion path is: receive message → classify (AutoMod ML + rule engine) → publish to topic (if approved). The latency budget for classification is ~10–30ms to keep total send-to-display under 100ms. Pre-load user ban lists and word lists into in-process memory on the ingestion service — don't query a DB per message.

IRC heritage: why the protocol matters

Twitch's client protocol is IRC-over-WebSocket (RFC 1459 with custom tags). This is a deliberate choice — it lets third-party bots and moderation tools use battle-tested IRC client libraries. The downside: IRC is a line-oriented text protocol with no native binary framing or compression. At high chat rates, JSON or MessagePack with delta compression would be 3–5× smaller. Twitch adds custom IRCv3 tags (metadata like subscriber badges, color, emote positions) which can make lines longer than the message itself.

Geographic distribution: the ordering problem

Twitch's infrastructure is multi-region — a European viewer's gateway might receive messages from a different pub/sub region than a US viewer. Message ordering is not globally consistent across regions; messages may arrive in different sequences at different gateways. Twitch assigns a server-side timestamp at ingestion (not at the source client) and clients sort by this timestamp. This gives consistent ordering within a region but allows cross-region skew of ~50–200ms, which is imperceptible at 500 msg/sec.

What breaks at scale

Gateway cold-start during a stream spike: a streamer goes viral (a celebrity joins unexpectedly) and viewer count goes from 50k to 500k in 60 seconds. New gateway nodes must spin up, subscribe to the channel topic, and establish connections — during this window, new viewers get no messages. The fix: pre-warm gateways based on viewer growth rate signals (if viewer count doubles in 30 seconds, auto-scale aggressively with a 2-minute lookahead). Second failure: moderation service overload when a controversial moment generates 2,000 reports/sec — isolate the report pipeline from the real-time chat path with a separate queue so moderation backlog doesn't block message delivery.

In production

Twitch's chat backend is built on a custom IRC-over-WebSocket protocol at the client layer, backed internally by a Go-based pub/sub system (TMI — Twitch Messaging Interface). Each channel is a pub/sub topic; gateway nodes subscribe to channels that have active viewers. The moderation pipeline (AutoMod, ban lists, regex filters) sits between ingestion and fan-out — messages are classified before being published to the topic, adding ~5–20ms of latency. YouTube Live Chat uses a polling architecture for most clients (HTTP long-poll or SSE) rather than WebSockets, sacrificing a small latency increase for much simpler connection management at scale. The real unsolved problem is consistent message ordering across a globally distributed viewer base — viewers in different regions see messages in different orders due to propagation delays, which is accepted as a known product limitation.

Common mistakes

Trying to deliver every message to everyone
No load shedding under viral spikes
Synchronous moderation on the hot path

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →