System Design Library

Slack

Team chat with channels, threads, search, and presence.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Channels/DMs/threads
Realtime delivery
Search history
Presence/notifications

Non-functional

Low latency
Searchable history

Scale

Large workspaces

The approach

WebSocket gateways + per-channel pub/sub (like Discord); messages persisted and indexed for search; per-workspace sharding; notification service for mentions.

Key components

WS gateway ⇄ pub/sub · message store + search index · notifications

Numbers that matter

Slack processes roughly 1 million messages/day per large enterprise workspace; a 10k-seat org can spike at hundreds of messages/second during an incident or all-hands.
WebSocket ping/pong heartbeat interval is typically 30 seconds; missing 2 consecutive beats triggers reconnect to avoid ghost connections accumulating on gateways.
Message fan-out latency target is <500ms p99 from send to delivery on all connected clients in the same channel; p50 is typically <100ms over a regional pub/sub bus.
Elasticsearch index per workspace; large workspaces with 10+ million messages require index splitting and alias routing to keep shard sizes under ~50GB for query performance.

Senior deep-dive

Per-workspace sharding is the real architectural decision — it lets you co-locate all channel, message, and member data for one org, making fan-out cheap and permission checks local.

WebSocket gateways are stateful and that's intentional — each gateway tracks which channels a connection is subscribed to; the pub/sub bus (Kafka/Redis) fans messages only to gateways that have a subscriber, not every gateway in the fleet.

Search is a first-class product, not an afterthought — messages are indexed into Elasticsearch per-workspace with custom analyzers for code/emoji; a naive Lucene default will miss user expectations.

Workspace sharding: keep the org together

Sharding by workspace (not by user or channel) means all data for one org lives on one shard set — cross-channel permission checks are local joins, not distributed queries. The tradeoff: a single mega-workspace (a 200k-seat enterprise) can exceed one shard's capacity. Slack handles this with intra-workspace sub-sharding once a workspace crosses a threshold, which complicates routing logic.

WebSocket gateway: stateful by design

Each gateway process holds an in-memory subscription map (connection → set of channel IDs). When a message arrives from the pub/sub bus, the gateway checks its local map and pushes only to relevant sockets — this is far cheaper than broadcasting and re-filtering at the client. The tradeoff is failover state loss: when a gateway crashes, clients must reconnect and re-subscribe, causing a reconnect storm you must rate-limit with jitter.

Message delivery: at-least-once with client dedup

Slack uses client-assigned nonces on outgoing messages so the server can dedup retried sends. The server assigns a canonical message timestamp (ts) as the immutable ID used in all APIs — this timestamp encodes order within a channel. Delivery receipts are not per-message ACKs from recipients; they're send-side confirmations from the server, keeping the hot path lean.

Presence and typing: the silent scaling trap

Broadcasting raw "typing" events would produce O(members × typists) fan-out per channel per second — catastrophic for large channels. The gateway coalesces typing signals into a periodic broadcast (e.g. every 1-2 seconds, "these users are typing") and drops duplicates. Presence (online/away) uses a TTL-based heartbeat in Redis; missing N heartbeats transitions the user to away, avoiding a thundering-herd of explicit offline events on mass disconnects.

Search: workspace-isolated Elasticsearch

Each workspace gets its own Elasticsearch index (or alias routing to a slice of a shared cluster). Custom analyzers handle code tokens, @mentions, and emoji. The challenge is permission-aware search — not every member can see every channel, so results must be filtered post-search against the caller's membership set; returning fewer results than requested per page is expected and not a bug.

What breaks at scale

Channel fan-out is the primary failure mode: a 100k-member announcement channel where every message fans out to tens of thousands of WebSocket connections can saturate the pub/sub bus and gateway CPUs simultaneously. Gateway reconnect storms after a rolling restart or AZ failure compound this — without exponential backoff + jitter on the client, every socket reconnects in the same second. Index lag during bulk import (onboarding a new enterprise with millions of archived Slack messages) can make search appear broken for hours post-migration.

In production

Slack's actual architecture uses Vitess-sharded MySQL per workspace for messages and a Kafka-backed pub/sub for real-time delivery to WebSocket gateways. The hard problem is presence and typing indicators at scale — naively broadcasting every "user is typing" event from a 10k-member channel creates a fan-out storm; Slack rate-limits and coalesces these signals server-side before fanning out. Threads were bolted on after the original flat-message model and required a non-trivial schema change to support parent/child message relationships without blowing up channel read queries.

Common mistakes

No search index (history unusable)
Cross-workspace shared infra (noisy neighbors)
Per-user fan-out instead of per-channel

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →