System Design Library

Polls / Voting

Collect votes and show live tallies without a write hotspot or double-voting.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Cast vote
  • Live results
  • One-vote-per-user

Non-functional

  • High write rate
  • No double counting

Scale

Viral polls: millions of votes fast

The approach

Sharded counters per option (like a view counter); a dedup set/record enforces one-vote-per-user; results aggregated from shards; eventual display, exact on close.

Key components

App → vote dedup + sharded counters → aggregate

Numbers that matter

Senior deep-dive

Sharded counters eliminate the write hotspot — a single row per option becomes a bottleneck above ~1,000 writes/sec; splitting across N shards and summing on read scales linearly.

Deduplication is the hard problem: storing a voter set (user_id → option) is O(voters) memory and a write for every vote; a Bloom filter or Redis SET trades exactness for space at the cost of false-positive 'already voted' errors.

Display tallies can be eventually consistent; final tallies cannot — show approximate counts during a live poll, reconcile exactly from the durable log when it closes.

Sharded counters: design and read cost

Partition each option's counter into N shards (counter:poll_id:option_id:shard_N). Writes hash the requester to a shard (or pick randomly), eliminating row-level hotspots. Reads SUM all N shards — this is the tradeoff: reads become N-fold more expensive. For a poll with 4 options and 100 shards, reading the result is 400 key fetches — fine with a Redis MGET pipeline (sub-millisecond), but cache the aggregate to avoid re-summing on every request.

Deduplication: the voter uniqueness problem

One-vote-per-user requires checking before incrementing — a read-then-write that is not atomic in Redis without Lua scripts or a separate SADD. The scalable approach: use a Redis SET per poll (SADD returns 0 if already a member) for small polls; switch to a Bloom filter for massive public polls where some false positives ('you already voted') are acceptable. For high-integrity polls (elections, shareholder votes), store every vote as an append-only log entry and deduplicate at query time.

Durability: Redis is not a database

Redis AOF/RDB persistence can lose up to 1 second of writes on crash. For polls where vote loss is unacceptable, write-through to a durable store (Postgres, Cassandra) is required — accept the latency or use an async queue (Kafka) with at-least-once delivery and idempotent consumers. The queue approach also gives you an audit log of every vote for reconciliation and fraud detection.

Live tally display: eventual is fine, final is not

During an active poll, showing approximate counts (summed from shards with a 5s cache) is perfectly acceptable UX — users don't need to the millisecond. Final tallies after poll close must be exact: run a reconciliation job that reads the durable vote log, counts definitively, and writes the canonical result. Display the reconciled number after close, not the cached shard-sum which may have minor discrepancies from race conditions.

Anonymous polls and IP-based dedup

Anonymous polls that dedup by IP address are trivially defeated by NAT (thousands of users share an IP) and VPNs. Cookie-based dedup requires the voter's browser to cooperate. The only reliable dedup for anonymous polls is rate-limiting by IP (prevent bulk stuffing) combined with anomaly detection on the vote velocity curve. Accept that anonymous polls have noisy results and design the UX around that truth.

What breaks at scale

Shard hot spots when N is too small: if a viral poll drives 500,000 votes/sec and you only have 10 shards, each shard still handles 50,000 writes/sec — still a bottleneck. Dynamically increase shard count or use a write buffer (client-side batching or a Kafka topic) to absorb the burst. Poll result inconsistency after shard redistribution: if you re-shard mid-poll (e.g., adding shards), the old shard counts must be migrated atomically or you'll double-count or lose votes during the transition.

In production

Twitter polls use Redis sorted sets and counters for live tallies with a write-through to Cassandra for durability. Google Forms uses strongly consistent Spanner writes for its polling product, accepting the latency cost in exchange for exact counts — appropriate for surveys where exactness matters. Slido (live event polling) prioritizes latency and uses an in-memory counter with batch flush to Postgres, accepting a small window of potential loss. The real challenge is the thundering herd at poll open time: when a question goes live during a live event, thousands of simultaneous votes arrive in the first 10 seconds — pre-warming the counter shards before the poll opens prevents cold-start contention.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →