Pinterest — System Design

System Design Library

Save and discover images on boards, with a recommendation-heavy feed.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Pin images to boards
Follow + feed
Visual recommendations
Search

Non-functional

Fast image delivery
Relevant discovery

Scale

Billions of pins

The approach

Images in object store + CDN; boards/pins metadata sharded; feed mixes follows + heavy recommendations (visual embeddings); discovery is the core, so the recsys pipeline is central.

Key components

Object store + CDN · pin/board store · recsys (embeddings) · feed

Numbers that matter

Pinterest serves ~5 billion recommendations per day across home feed, related pins, and search.
A single Pin image is encoded into a ~256-dimensional float32 visual embedding (~1 KB) used for similarity retrieval.
ANN indexes like HNSW can query 100M+ vectors in <10ms with >95% recall at 10 neighbors.
Image CDN cache-hit rates on popular Pins exceed 99% — the long tail of obscure Pins is a tiny fraction of traffic.

Senior deep-dive

Discovery is the product — Pinterest is a recommendation engine that happens to store images, not an image host that happens to recommend.

Visual embeddings drive the feed: offline pipelines encode every Pin into a vector; ANN retrieval at serving time dominates the architecture more than any social graph.

The graph is sparse and cold-start is brutal — a new user has zero signal, so you fall back to content-based similarity before you ever have behavioral data.

Visual embeddings are the core primitive

Every Pin is encoded offline by a convolutional model (or transformer) into a dense vector. At serving time, the home feed is essentially a k-nearest-neighbor query in embedding space filtered by user affinity signals. Text metadata helps but image similarity carries most of the semantic weight — a mislabeled Pin with a great photo still surfaces correctly.

Two-stage retrieval: candidate generation then ranking

The first stage is fast ANN retrieval from an approximate index (HNSW or IVF-PQ) returning hundreds of candidates per user per request. The second stage is a pointwise ranker (a small neural net) that scores each candidate against real-time user context (recent clicks, session signals). Skipping the two-stage design and doing exact nearest-neighbor at serving time is the classic scale trap — it's fine at 1M Pins, catastrophic at 300M.

Storage: object store for originals, CDN for renditions

Originals live in S3. On upload, an async transcoding pipeline generates several renditions (grid thumbnail ~236px, closeup ~564px, original). The CDN handles essentially all traffic — direct-to-origin fetches should be a rounding error. Rendition metadata (URL, dimensions) is stored in the Pin's metadata record so clients never compute URLs dynamically.

Board and Pin graph sharding

The data model is Users → Boards → Pins (a board is essentially a labeled collection). Sharding by user_id keeps a user's board writes hot on one shard but makes cross-user feed fan-out expensive. Pinterest shards the social graph (follower/following) separately from the content graph (board/pin) to allow each to scale independently. Adjacency list compression (delta-encoding follower lists) is critical when a single popular creator has tens of millions of followers.

Cold start: content-based before behavioral

A brand-new user has no click history. You fall back to onboarding interest selection (seed categories) and immediately do content-based retrieval from those cluster centroids. As the user interacts, implicit feedback (saves, clicks, dwell time) shifts weight toward collaborative signals. The transition from content-based to collaborative happens within the first session — getting that transition smooth is a product-critical engineering problem, not just an ML one.

What breaks at scale

Embedding index staleness is the first major break — retraining the vision model and re-encoding 300M Pins takes days, so you're always serving a slightly stale index. Hot Pins (a viral image gets saved millions of times in an hour) create write hotspots in the engagement counters and invalidate cached recommendations for everyone who follows a board containing that Pin. Closed-loop feedback poisoning is subtle: the ranker trains on what it showed users, so popular content gets recommended more, which makes it more popular — diversity drops without explicit exploration terms in the objective.

In production

Pinterest built Pixie (their ANN-based recommendation engine) and PinSage (graph convolutional network for Pin embeddings) in-house because off-the-shelf recommenders couldn't handle the multi-modal nature of the content — a Pin has an image, title, description, and a board context all at once. The real challenge is freshness vs. quality: re-indexing billions of Pins with new embeddings after model retraining is a days-long pipeline, so the online index always lags the latest model, and you ship stale recommendations during that window.

Common mistakes

Chronological feed (misses discovery)
Online similarity over all pins
Serving images from origin

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →