Communities of posts and nested comments, ranked by a hot/score algorithm.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Post to subreddits
- Vote
- Nested comments
- Ranked feeds (hot/new/top)
Non-functional
- Read-heavy
- Fast ranked feeds
Scale
Billions of votes; deep comment trees
The approach
Posts/votes in a sharded store; ranking score (hot = f(votes, age)) precomputed and cached per feed; comment trees stored with a path/closure-table for efficient subtree reads; heavy caching.
Key components
App → post/vote store · ranking pipeline → feed cache · comment tree store
Numbers that matter
- ~50:1 read:write ratio typical for Reddit — voting and reading dwarf new post/comment creation; caches are essential.
- Wilson score confidence interval (or Reddit's original hot algorithm) uses ~log₁₀(max(|ups-downs|,1)) + sign * seconds_since_epoch/45000 — the score decays in ~hours.
- ~10-50ms is the target latency for serving a cached subreddit front page; uncached ranking recalculation over 100K posts would take seconds.
- ~3-5 levels is the typical comment nesting depth shown by default (collapsed beyond that) — this bounds the closure-table join depth in practice.
Senior deep-dive
Ranking is a read-time computation that must be precomputed — recalculating hot scores across millions of posts on every feed request is not viable; scores age quickly, so you need a background job that recomputes and caches periodically.
Comment trees at scale demand a materialized path or closure table, not recursive parent-pointer queries — fetching a 1,000-comment thread with N self-joins is catastrophic.
Vote counting is the write hotspot — a viral post can receive thousands of votes per second on a single row; sharded counters or buffered aggregation is mandatory to avoid update contention.
Hot score precomputation: ranking is a background job
The hot score formula decays with time — a post from 6 hours ago with 1000 upvotes scores lower than a post from 1 hour ago with 200 upvotes. This means scores change even without new votes: you must periodically recompute scores for all active posts, not just on vote events. A background job scans active posts (last 24h), recomputes the score, and writes to a sorted set in Redis. The sorted set is the feed: ZRANGEBYSCORE gives you the top-K posts for any subreddit in O(log N + K).
Vote counting: sharded counters or batch flushing
A viral post hitting the front page receives a write spike on a single row. Naive `UPDATE posts SET score=score+1 WHERE id=X` creates lock contention at scale. Two solutions: sharded counters (N counter rows per post, update a random shard, read by summing), or buffer increments in Redis (INCR is atomic, lock-free) and flush to the DB in batches every few seconds. The Redis approach is simpler and what most production systems use.
Comment tree storage: why parent_id pointers fail
A naive adjacency list (each comment stores parent_id) requires recursive CTEs or application-side tree reconstruction to display a thread. At 10K comments, a recursive CTE is slow; application-side reconstruction requires fetching all comments first. Closure table (store every ancestor-descendant pair) allows fetching a full subtree with one query. Materialized path (store '/1/3/7/' on each comment) is simpler and nearly as fast. Reddit uses a hybrid: `parent_id` in storage, application-side tree building after fetching all comments in a thread, bounded by a max depth and collapsing old threads.
Feed generation: precomputed vs. on-demand
Home feed (posts from subscribed subreddits) for an active user subscribed to 500 subreddits is expensive on-demand: merge 500 sorted sets, rerank, paginate. The trade-off: fan-out on write (push new posts into every subscriber's feed at post time) doesn't scale for large subreddits with millions of subscribers. The real answer is hybrid: small subreddits fan-out; large subreddits (r/AskReddit) use pull-on-read and cache the merged feed per user for a short TTL.
Subreddit-level caching: the easy optimization
The subreddit front page (top posts in /r/programming right now) is identical for every anonymous viewer. Cache it aggressively with a 30-60s TTL in Redis or a CDN. For logged-in users, you need to overlay vote state (did I upvote this?) — do this as a separate client-side merge after serving the cached page, not by busting the cache per user. This is the classic decorator pattern for personalization: serve the shared base, overlay personal state at the edge.
What breaks at scale
Vote fraud detection at write time is expensive — checking for brigading, bot farms, or sock puppets on every vote requires ML inference or graph traversal that can't happen synchronously. Async fraud scoring is fine but means fraudulent votes affect rankings for seconds before correction. Subreddit hot-list cache stampedes: when the cache expires for a popular subreddit, thousands of requests hit the DB simultaneously to recompute — use probabilistic early expiration (randomly recompute before TTL expires) or a background refresh process. Comment threading under heavy nesting: users nest replies 20 levels deep on drama posts, the closure table explodes in row count, and fetching the thread hits joins on millions of pairs.
In production
Reddit uses Cassandra for post/comment storage (wide rows per post), Postgres for votes and account data, and has publicly discussed their 'best' algorithm (Wilson score) for comment ranking. The real engineering challenge is vote counting under viral load: a post hitting the front page gets 10K upvotes in minutes — a single Postgres row `UPDATE SET score = score + 1` under that concurrency causes lock contention and cascading latency. Reddit historically used Cassandra counters and later moved to batched Redis increments flushed periodically. The second hard problem is comment tree rendering at scale: Reddit stores `parent_id` and uses recursive queries, which works at moderate depth but requires a depth limit and careful indexing.
Common mistakes
- Recomputing hot scores per request
- Recursive comment fetches
- No feed caching