System Design Library

Facebook News Feed

A ranked, personalized feed mixing friends' posts, pages and ads.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Aggregate from many sources
ML ranking
Ads insertion
Pagination

Non-functional

Low latency
Fresh + relevant

Scale

Billions of users

The approach

Fan-out candidate posts into a per-user store (like Twitter) but add a ranking layer: a model scores candidates by predicted engagement; ads blended in; results cached + paginated.

Key components

Fan-out → candidate store → ML ranker → blended feed cache

Numbers that matter

~500M active users × 200 friends avg = ~100B feed entries that must be maintained in fan-out-on-write; even a 1-byte flag per entry is 100GB of fan-out state.
~150ms total feed generation budget at Facebook — candidate retrieval, ranking model inference, ad auction, and serialization all in under that.
~2,000 candidates retrieved for each feed ranking pass, ranked down to ~25 shown posts — a ~80:1 candidate-to-display ratio.
~10× write amplification for fan-out-on-write to a celebrity with 10M followers posting once — 10M writes per post event.

Senior deep-dive

Fan-out strategy is the central architectural decision — fan-out-on-write (push to all followers at post time) optimizes reads but explodes for celebrities; fan-out-on-read is cheap to write but slow to serve; hybrid is the only production answer.

Ranking is not sorting by time — a learned model scoring by predicted engagement (likes, comments, shares, dwell time) runs on each request against a pre-assembled candidate set; that two-stage pipeline is what separates a feed from a firehose.

Ads are not an afterthought — the blending model must jointly optimize organic engagement and ad revenue, and the ad auction happens inline with feed ranking, which means ad serving latency is feed latency.

Hybrid fan-out: the only architecture that survives celebrities

Fan-out-on-write pushes a post into every follower's feed inbox at post time — O(followers) writes per post. For a user with 10M followers, one post triggers 10M writes, saturating your write tier. Fan-out-on-read assembles the feed fresh on every request — O(following_count) reads per feed load, which is acceptable but slow under ranking. The hybrid: regular users fan-out on write; accounts above a follower threshold (say, 1M) are read from a shared hot-post cache and merged at serving time. Celeb posts are injected at feed-render time, not pre-inserted per follower.

Ranking: from candidate set to scored feed

Feed ranking is a two-stage pipeline: a fast retrieval stage gathers ~2,000 candidates (friend posts, page posts, sponsored) from pre-built indexes; a scoring stage runs a learned model (gradient-boosted trees or a neural ranker) over each candidate using features like author affinity, post type, recency, and predicted engagement. The model must run in single-digit milliseconds per candidate — batched inference on a GPU server, not per-request model loading. Features are pre-computed and stored in a fast feature store; freshness matters (a post trending in the last 5 minutes should rank higher).

The interest graph vs. social graph

Pure social graph feeds (only friends' posts) produce filter bubbles and miss content discovery. Facebook's EdgeRank and later models learned that engagement signals cross-cut the social graph: you might deeply engage with cooking content from a weak tie. The feed must blend social graph candidates (close friends, family) with interest graph candidates (pages you follow, content from accounts you never followed but would engage with). This means the candidate set includes both social pulls and collaborative-filtering recommendations, sourced from different indexes.

Ad insertion: inline with ranking, not bolted on

Ads are not appended after organic ranking — they are inserted at specific positions (every 4th item, with a minimum quality floor) using a per-impression auction. The auction runs in the same latency budget as ranking: the ad server retrieves eligible ads, runs a second-price auction (bid × predicted CTR = effective CPM), and the winner is slotted into the ranked feed. This means the ranking service must call the ad service in parallel or the ad request must be pre-cached. Ad quality scoring (predicted CTR, negative feedback rate) prevents low-quality ads from degrading feed experience.

Stale feed and the seen-post problem

A user who checks their feed 10 times per day must not see the same posts repeatedly. You need a seen-post filter per user: a server-side bloom filter or a time-windowed exclusion set. Storing N seen post IDs per user at Facebook scale (500M users × thousands of posts per day) is expensive — approximate membership via bloom filter or counting filter is the right trade-off. The filter only needs to cover the last 24-48 hours; posts older than that naturally fall out of ranking anyway.

What breaks at scale

Fan-out storms when a celebrity posts during a peak event (Super Bowl, election) — the write amplification is bounded by followers but the timing concentrates it. Use write throttling and staggered fan-out (spread the 10M writes over a few seconds). Ranking model freshness decay: a model trained on last week's data systematically underweights emerging trends — you need near-real-time retraining pipelines (online learning or daily retrain with hourly feature updates). Feed assembly timeout: if one of 200 friends' post stores is slow, you degrade everyone subscribed to that friend; use hedged requests and partial results (return the feed with available candidates if any source times out, rather than failing entirely).

In production

Facebook's TAO system (a graph cache on top of MySQL) stores the social graph; Multifeed assembles candidates by reading from friends' post stores; a ranking model (GBDT or neural) scores candidates. Twitter (now X) pioneered the hybrid fan-out described in their 2013 architecture post: small accounts fan-out on write, celebrities (>1M followers) are pulled on read and merged at serving time. The real engineering challenge is the ranking model's feature freshness: the model uses social context (did my friends engage with this post?) that changes by the second, but computing it for 2,000 candidates in <150ms requires a very fast feature lookup tier, which is why Facebook built a dedicated feature store (Ninja/Scuba for real-time, Hive for batch) serving features at sub-10ms.

Common mistakes

Chronological-only (ignores relevance)
Ranking the whole candidate world
No ad/organic blending strategy

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →