System Design Library

Live Polling (Slido/Kahoot)

Run live audience polls/quizzes with realtime results to thousands.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Push question
  • Collect answers
  • Live aggregated results
  • Timing/scoring

Non-functional

  • Realtime
  • Spike-tolerant

Scale

Thousands answering at once

The approach

WS to participants; a burst of answers hits sharded counters; results aggregated and pushed back live; for quizzes, server timestamps answers for scoring; rate-limit/dedup per participant.

Key components

Host → WS fan-out → participants → sharded answer counters → live results

Numbers that matter

Senior deep-dive

Sharded counters, not a single row, are the only way to survive the burst when everyone votes in the same 2-second window.

Server-side timestamping is the non-obvious requirement for quiz scoring — client clocks are unreliable and easily manipulated.

Push-back rate limiting per participant (one vote per question, enforced server-side with a dedup set) is both a correctness and a DDoS defense.

Sharded counters absorb the burst

A single row with `UPDATE SET count = count + 1` serializes writes; under 10k concurrent votes it collapses immediately. Counter sharding (e.g. 32 keys `poll:Q1:opt:A:shard:{0..31}`, increment by `hash(session_id) % 32`) distributes writes across 32 independent Redis keyspaces. Read time sums 32 keys — still <5ms. Redis INCR is atomic and single-threaded per slot, so no locking needed; the math just works.

WebSocket fan-out needs a pub/sub layer

A single gateway process can hold ~10–50k WebSocket connections but cannot share counter state with sibling processes directly. All gateways subscribe to a per-poll pub/sub channel (Redis pub/sub or Kafka); the aggregation job publishes result snapshots every 500ms–1s. Push-on-delta (only publish when results change by >0.5%) prevents hammering clients during slow answer trickle. Never push on every write — the aggregation cadence decouples vote ingest rate from fan-out rate.

Server-side timestamping is the quiz integrity contract

For scored quizzes the server must timestamp answers at receipt, not at client submission. Network RTT varies 20–300ms; a client can fake a submission time. The server stamps the receive time, subtracts question-open time (stored in-memory), and scores by elapsed time. Dedup keys (`set:answered:{session}:{question}`) enforce one-answer-per-question atomically — `SETNX` returns 0 if already set, rejecting re-submissions. This is the entire anti-cheat surface.

Isolate per-session state to prevent cross-poll bleed

All counter keys, dedup sets, and WebSocket subscriptions must be scoped by poll_id + session_id. A missing prefix causes answer counts to bleed between concurrent events, which has happened in production incidents. Use Redis key TTLs matching the poll lifetime plus a safety buffer (e.g. poll_duration + 10 minutes) to auto-expire all state; this prevents unbounded memory growth across thousands of events per day. Don't rely on explicit cleanup — TTLs are the operational safety net.

Rate-limiting per participant is both correctness and DoS defense

Without server-side rate limiting, a bot can `POST /vote` thousands of times and skew results. The dedup set (one answer per question) handles correctness; a token bucket per session (e.g. 5 requests/second max) handles volumetric abuse. For large events, also enforce CAPTCHA or session tokens issued at join time — a shared URL means anyone with the link is a potential spammer. The rate limit enforced at the gateway (not in the application tier) is cheaper and catches abuse before business logic runs.

What breaks at scale

The catastrophic failure mode is thundering herd on question reveal — all clients simultaneously open a WebSocket connection when the presenter reveals Q2. Connection storms can exhaust gateway file descriptors in seconds. Mitigate with jittered reconnect backoff and by pre-establishing connections at event join time rather than per-question. The second failure mode is Redis single-node saturation: at >100k concurrent voters across many shards, a single Redis instance becomes the bottleneck; Redis Cluster with hash-slot-aware client routing is the solution, not a bigger instance.

In production

Slido uses a Redis-backed sharded counter approach and pushes results via WebSocket every 500ms–1s rather than on every vote, trading perfect freshness for dramatically reduced message volume. Kahoot adds a server-enforced answer deadline and timestamps answers at receipt, not at client submission — the server clock is the ground truth for scoring. The real engineering challenge is the thundering-herd at question reveal: every client submits within seconds, and you must absorb that spike without stalling the results display.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →