System Design Library

Instagram

Photo/video sharing with feeds, at billions of media objects.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Upload media
  • Feed of follows
  • Likes/comments
  • Stories
  • Explore

Non-functional

  • Fast media delivery
  • Durable storage
  • Read-heavy

Scale

Billions of photos; global users

The approach

Media → object store + global CDN; metadata + graph in sharded DB; feed via fan-out like Twitter; image processing (resize/transcode) async via queue + workers.

Key components

Upload → object store + queue → processing workers · metadata DB (sharded) · feed cache · CDN

Numbers that matter

Senior deep-dive

The feed is the hardest part — not media storage; object storage + CDN is commodity infrastructure, but generating a fresh ranked feed for 500M DAU in under 500ms requires a hybrid fan-out architecture.

Fan-out on write (push) for most users, fan-out on read (pull) for celebrities — a celebrity with 50M followers can't fan out on write (50M writes per post); their posts are injected at read time into a pre-built follower feed.

Media pipeline is async and idempotent — uploads succeed when the original lands in S3; resizing/transcoding/blurhash generation happen in a separate queue so the user gets their post ID immediately, and variants appear within seconds.

Media upload: the pipeline, not the storage

The naive approach — upload to S3, then synchronously resize, then return success — creates 2-10 second latencies. The correct pattern: accept upload, write original to S3, return post ID immediately, then publish an event to a queue (SQS/Kafka). Async workers pick up the job, generate variants (thumbnail, feed-size, full), write them back to S3 under deterministic keys (`/media/<post_id>/<variant>.jpg`). The app shows a blurhash placeholder until the CDN propagates the variants — usually under 5 seconds.

Feed storage: why Cassandra, not Postgres

A user's feed is a time-ordered list of post IDs for their followees. This is a perfect Cassandra use case: partition by user_id, cluster by timestamp DESC, with a bounded row size (keep last 1000 post IDs). Random reads (get feed for user X) are a single partition scan — O(1) regardless of total data size. Postgres with a timeline table hits cross-shard JOIN hell at Instagram's user count. The feed store holds IDs only — metadata is fetched separately and assembled by the API layer.

The celebrity problem: hybrid fan-out

Fan-out-on-write writes a post ID to every follower's feed inbox at publish time — great for average users (< 10K followers), but for accounts with millions of followers it's a write amplification bomb. The hybrid solution: maintain a "celebrity" flag on accounts above a follower threshold. Celebrity posts are not fanned out; instead, at feed-read time, the service fetches the last N posts from all celebrity followees and merges them into the pre-built feed. This shifts cost from write-time to read-time, but read-time is cheap (parallel fetches from a small celebrity post cache).

Sharding strategy: by user, not by content

User ID sharding for the social graph and metadata means all of a user's posts, followers, and follows live on the same shard — enabling local joins for profile pages and self-feed generation. Content (media) is not sharded — it lives in object storage keyed by content ID. The problem: cross-shard queries for things like 'all posts from user A, B, C' (building a feed) require scatter-gather across shards. This is why the pre-built feed cache is so important — it eliminates the scatter-gather on the hot path.

Ranking: from reverse-chronological to ML-scored

Early Instagram served feeds in reverse-chronological order — trivial to implement (sort by timestamp), but users missed posts when they were offline. The move to a ranked feed (2016) used signals like: predicted likes/comments, relationship strength (how often you interact), recency, and post type (video vs. photo). The ranking model runs as an online inference call over ~100-500 feed candidates, adding ~50-100ms to feed latency. Cache ranked feed snapshots (invalidated on new post or model update) to avoid ranking on every scroll.

What breaks at scale

Hot user shards: a shard hosting a celebrity account handles disproportionate read traffic — their posts' metadata is read by millions of feed-loading users. Mitigate with read replicas per shard and an aggressive metadata cache (Redis) in front of the shard. CDN origin storms: when a new post from a celebrity goes viral, millions of users simultaneously request a URL that isn't in CDN cache yet — this is the cache stampede. Prevent it with request coalescing at the CDN (only one request to origin per cache miss, queue others) and pre-warming for scheduled/predictable high-traffic events.

In production

Instagram's original architecture used PostgreSQL sharded by user ID, Cassandra for feeds, and Django — scaling to 1M users with 3 engineers. After the Facebook acquisition, the feed moved to a purpose-built ranking system with ML-scored candidates. The real engineering challenge is the dual-write consistency problem: when a user follows someone, their feed must retroactively include recent posts from the new followee — you can't re-scan all historical posts, so systems maintain a time-bounded backfill window (last 24-48h posts from new followee are injected into the feed on follow) and rely on the user scrolling to discover older content.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →