Slack
Team chat with channels, threads, search, and presence.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Channels/DMs/threads
- Realtime delivery
- Search history
- Presence/notifications
Non-functional
- Low latency
- Searchable history
Scale
Large workspaces
The approach
WebSocket gateways + per-channel pub/sub (like Discord); messages persisted and indexed for search; per-workspace sharding; notification service for mentions.
Key components
WS gateway ⇄ pub/sub · message store + search index · notifications
Numbers that matter
- Slack processes roughly 1 million messages/day per large enterprise workspace; a 10k-seat org can spike at hundreds of messages/second during an incident or all-hands.
- WebSocket ping/pong heartbeat interval is typically 30 seconds; missing 2 consecutive beats triggers reconnect to avoid ghost connections accumulating on gateways.
- Message fan-out latency target is <500ms p99 from send to delivery on all connected clients in the same channel; p50 is typically <100ms over a regional pub/sub bus.
- Elasticsearch index per workspace; large workspaces with 10+ million messages require index splitting and alias routing to keep shard sizes under ~50GB for query performance.
Senior deep-dive
Per-workspace sharding is the real architectural decision — it lets you co-locate all channel, message, and member data for one org, making fan-out cheap and permission checks local.
WebSocket gateways are stateful and that's intentional — each gateway tracks which channels a connection is subscribed to; the pub/sub bus (Kafka/Redis) fans messages only to gateways that have a subscriber, not every gateway in the fleet.
Search is a first-class product, not an afterthought — messages are indexed into Elasticsearch per-workspace with custom analyzers for code/emoji; a naive Lucene default will miss user expectations.
Workspace sharding: keep the org together
Sharding by workspace (not by user or channel) means all data for one org lives on one shard set — cross-channel permission checks are local joins, not distributed queries. The tradeoff: a single mega-workspace (a 200k-seat enterprise) can exceed one shard's capacity. Slack handles this with intra-workspace sub-sharding once a workspace crosses a threshold, which complicates routing logic.
WebSocket gateway: stateful by design
Each gateway process holds an in-memory subscription map (connection → set of channel IDs). When a message arrives from the pub/sub bus, the gateway checks its local map and pushes only to relevant sockets — this is far cheaper than broadcasting and re-filtering at the client. The tradeoff is failover state loss: when a gateway crashes, clients must reconnect and re-subscribe, causing a reconnect storm you must rate-limit with jitter.
Message delivery: at-least-once with client dedup
Slack uses client-assigned nonces on outgoing messages so the server can dedup retried sends. The server assigns a canonical message timestamp (ts) as the immutable ID used in all APIs — this timestamp encodes order within a channel. Delivery receipts are not per-message ACKs from recipients; they're send-side confirmations from the server, keeping the hot path lean.
Presence and typing: the silent scaling trap
Broadcasting raw "typing" events would produce O(members × typists) fan-out per channel per second — catastrophic for large channels. The gateway coalesces typing signals into a periodic broadcast (e.g. every 1-2 seconds, "these users are typing") and drops duplicates. Presence (online/away) uses a TTL-based heartbeat in Redis; missing N heartbeats transitions the user to away, avoiding a thundering-herd of explicit offline events on mass disconnects.
Search: workspace-isolated Elasticsearch
Each workspace gets its own Elasticsearch index (or alias routing to a slice of a shared cluster). Custom analyzers handle code tokens, @mentions, and emoji. The challenge is permission-aware search — not every member can see every channel, so results must be filtered post-search against the caller's membership set; returning fewer results than requested per page is expected and not a bug.
What breaks at scale
Channel fan-out is the primary failure mode: a 100k-member announcement channel where every message fans out to tens of thousands of WebSocket connections can saturate the pub/sub bus and gateway CPUs simultaneously. Gateway reconnect storms after a rolling restart or AZ failure compound this — without exponential backoff + jitter on the client, every socket reconnects in the same second. Index lag during bulk import (onboarding a new enterprise with millions of archived Slack messages) can make search appear broken for hours post-migration.
In production
Slack's actual architecture uses Vitess-sharded MySQL per workspace for messages and a Kafka-backed pub/sub for real-time delivery to WebSocket gateways. The hard problem is presence and typing indicators at scale — naively broadcasting every "user is typing" event from a 10k-member channel creates a fan-out storm; Slack rate-limits and coalesces these signals server-side before fanning out. Threads were bolted on after the original flat-message model and required a non-trivial schema change to support parent/child message relationships without blowing up channel read queries.
Common mistakes
- No search index (history unusable)
- Cross-workspace shared infra (noisy neighbors)
- Per-user fan-out instead of per-channel