Live Polling (Slido/Kahoot)
Run live audience polls/quizzes with realtime results to thousands.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Push question
- Collect answers
- Live aggregated results
- Timing/scoring
Non-functional
- Realtime
- Spike-tolerant
Scale
Thousands answering at once
The approach
WS to participants; a burst of answers hits sharded counters; results aggregated and pushed back live; for quizzes, server timestamps answers for scoring; rate-limit/dedup per participant.
Key components
Host → WS fan-out → participants → sharded answer counters → live results
Numbers that matter
- A 10,000-person live audience answering simultaneously generates ~10k writes/second in a 1–2 second burst — a single DB row would serialize at ~1–5k TPS max
- 32–64 counter shards per poll option reduce per-shard write rate to <200 wps, making atomic Redis INCR trivially fast at <1ms
- Results aggregation (sum across shards) adds <5ms — imperceptible to a human waiting 2–3 seconds for the bar chart to animate in
- WebSocket fan-out to 10k clients from a single node is feasible at ~1–2 Gbps NIC saturation; beyond that you need multiple gateway instances with a shared pub/sub bus
Senior deep-dive
Sharded counters, not a single row, are the only way to survive the burst when everyone votes in the same 2-second window.
Server-side timestamping is the non-obvious requirement for quiz scoring — client clocks are unreliable and easily manipulated.
Push-back rate limiting per participant (one vote per question, enforced server-side with a dedup set) is both a correctness and a DDoS defense.
Sharded counters absorb the burst
A single row with `UPDATE SET count = count + 1` serializes writes; under 10k concurrent votes it collapses immediately. Counter sharding (e.g. 32 keys `poll:Q1:opt:A:shard:{0..31}`, increment by `hash(session_id) % 32`) distributes writes across 32 independent Redis keyspaces. Read time sums 32 keys — still <5ms. Redis INCR is atomic and single-threaded per slot, so no locking needed; the math just works.
WebSocket fan-out needs a pub/sub layer
A single gateway process can hold ~10–50k WebSocket connections but cannot share counter state with sibling processes directly. All gateways subscribe to a per-poll pub/sub channel (Redis pub/sub or Kafka); the aggregation job publishes result snapshots every 500ms–1s. Push-on-delta (only publish when results change by >0.5%) prevents hammering clients during slow answer trickle. Never push on every write — the aggregation cadence decouples vote ingest rate from fan-out rate.
Server-side timestamping is the quiz integrity contract
For scored quizzes the server must timestamp answers at receipt, not at client submission. Network RTT varies 20–300ms; a client can fake a submission time. The server stamps the receive time, subtracts question-open time (stored in-memory), and scores by elapsed time. Dedup keys (`set:answered:{session}:{question}`) enforce one-answer-per-question atomically — `SETNX` returns 0 if already set, rejecting re-submissions. This is the entire anti-cheat surface.
Isolate per-session state to prevent cross-poll bleed
All counter keys, dedup sets, and WebSocket subscriptions must be scoped by poll_id + session_id. A missing prefix causes answer counts to bleed between concurrent events, which has happened in production incidents. Use Redis key TTLs matching the poll lifetime plus a safety buffer (e.g. poll_duration + 10 minutes) to auto-expire all state; this prevents unbounded memory growth across thousands of events per day. Don't rely on explicit cleanup — TTLs are the operational safety net.
Rate-limiting per participant is both correctness and DoS defense
Without server-side rate limiting, a bot can `POST /vote` thousands of times and skew results. The dedup set (one answer per question) handles correctness; a token bucket per session (e.g. 5 requests/second max) handles volumetric abuse. For large events, also enforce CAPTCHA or session tokens issued at join time — a shared URL means anyone with the link is a potential spammer. The rate limit enforced at the gateway (not in the application tier) is cheaper and catches abuse before business logic runs.
What breaks at scale
The catastrophic failure mode is thundering herd on question reveal — all clients simultaneously open a WebSocket connection when the presenter reveals Q2. Connection storms can exhaust gateway file descriptors in seconds. Mitigate with jittered reconnect backoff and by pre-establishing connections at event join time rather than per-question. The second failure mode is Redis single-node saturation: at >100k concurrent voters across many shards, a single Redis instance becomes the bottleneck; Redis Cluster with hash-slot-aware client routing is the solution, not a bigger instance.
In production
Slido uses a Redis-backed sharded counter approach and pushes results via WebSocket every 500ms–1s rather than on every vote, trading perfect freshness for dramatically reduced message volume. Kahoot adds a server-enforced answer deadline and timestamps answers at receipt, not at client submission — the server clock is the ground truth for scoring. The real engineering challenge is the thundering-herd at question reveal: every client submits within seconds, and you must absorb that spike without stalling the results display.
Common mistakes
- Single counter (burst hotspot)
- Trusting client timestamps for scoring
- No per-user dedup