Photo/video sharing with feeds, at billions of media objects.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Upload media
- Feed of follows
- Likes/comments
- Stories
- Explore
Non-functional
- Fast media delivery
- Durable storage
- Read-heavy
Scale
Billions of photos; global users
The approach
Media → object store + global CDN; metadata + graph in sharded DB; feed via fan-out like Twitter; image processing (resize/transcode) async via queue + workers.
Key components
Upload → object store + queue → processing workers · metadata DB (sharded) · feed cache · CDN
Numbers that matter
- Instagram serves ~100 billion photos viewed per day — almost entirely from CDN edge caches, with origin S3 hit rates under 1% for popular content
- A celebrity post to 50M followers at fan-out-on-write would create 50M DB writes in seconds — the threshold to switch to pull-on-read fan-out is typically ~1M followers
- Instagram stores images at multiple resolutions (thumbnail ~150px, feed ~640px, full ~1080px) — transcoding is done once async; the CDN caches each variant independently
- Feed generation latency budget: pre-built feed cache hit returns in ~50ms; a cache miss requiring live candidate ranking must still complete in < 500ms or users see loading spinners
Senior deep-dive
The feed is the hardest part — not media storage; object storage + CDN is commodity infrastructure, but generating a fresh ranked feed for 500M DAU in under 500ms requires a hybrid fan-out architecture.
Fan-out on write (push) for most users, fan-out on read (pull) for celebrities — a celebrity with 50M followers can't fan out on write (50M writes per post); their posts are injected at read time into a pre-built follower feed.
Media pipeline is async and idempotent — uploads succeed when the original lands in S3; resizing/transcoding/blurhash generation happen in a separate queue so the user gets their post ID immediately, and variants appear within seconds.
Media upload: the pipeline, not the storage
The naive approach — upload to S3, then synchronously resize, then return success — creates 2-10 second latencies. The correct pattern: accept upload, write original to S3, return post ID immediately, then publish an event to a queue (SQS/Kafka). Async workers pick up the job, generate variants (thumbnail, feed-size, full), write them back to S3 under deterministic keys (`/media/<post_id>/<variant>.jpg`). The app shows a blurhash placeholder until the CDN propagates the variants — usually under 5 seconds.
Feed storage: why Cassandra, not Postgres
A user's feed is a time-ordered list of post IDs for their followees. This is a perfect Cassandra use case: partition by user_id, cluster by timestamp DESC, with a bounded row size (keep last 1000 post IDs). Random reads (get feed for user X) are a single partition scan — O(1) regardless of total data size. Postgres with a timeline table hits cross-shard JOIN hell at Instagram's user count. The feed store holds IDs only — metadata is fetched separately and assembled by the API layer.
The celebrity problem: hybrid fan-out
Fan-out-on-write writes a post ID to every follower's feed inbox at publish time — great for average users (< 10K followers), but for accounts with millions of followers it's a write amplification bomb. The hybrid solution: maintain a "celebrity" flag on accounts above a follower threshold. Celebrity posts are not fanned out; instead, at feed-read time, the service fetches the last N posts from all celebrity followees and merges them into the pre-built feed. This shifts cost from write-time to read-time, but read-time is cheap (parallel fetches from a small celebrity post cache).
Sharding strategy: by user, not by content
User ID sharding for the social graph and metadata means all of a user's posts, followers, and follows live on the same shard — enabling local joins for profile pages and self-feed generation. Content (media) is not sharded — it lives in object storage keyed by content ID. The problem: cross-shard queries for things like 'all posts from user A, B, C' (building a feed) require scatter-gather across shards. This is why the pre-built feed cache is so important — it eliminates the scatter-gather on the hot path.
Ranking: from reverse-chronological to ML-scored
Early Instagram served feeds in reverse-chronological order — trivial to implement (sort by timestamp), but users missed posts when they were offline. The move to a ranked feed (2016) used signals like: predicted likes/comments, relationship strength (how often you interact), recency, and post type (video vs. photo). The ranking model runs as an online inference call over ~100-500 feed candidates, adding ~50-100ms to feed latency. Cache ranked feed snapshots (invalidated on new post or model update) to avoid ranking on every scroll.
What breaks at scale
Hot user shards: a shard hosting a celebrity account handles disproportionate read traffic — their posts' metadata is read by millions of feed-loading users. Mitigate with read replicas per shard and an aggressive metadata cache (Redis) in front of the shard. CDN origin storms: when a new post from a celebrity goes viral, millions of users simultaneously request a URL that isn't in CDN cache yet — this is the cache stampede. Prevent it with request coalescing at the CDN (only one request to origin per cache miss, queue others) and pre-warming for scheduled/predictable high-traffic events.
In production
Instagram's original architecture used PostgreSQL sharded by user ID, Cassandra for feeds, and Django — scaling to 1M users with 3 engineers. After the Facebook acquisition, the feed moved to a purpose-built ranking system with ML-scored candidates. The real engineering challenge is the dual-write consistency problem: when a user follows someone, their feed must retroactively include recent posts from the new followee — you can't re-scan all historical posts, so systems maintain a time-bounded backfill window (last 24-48h posts from new followee are injected into the feed on follow) and rely on the user scrolling to discover older content.
Common mistakes
- Resizing images on the fly per request
- One giant unsharded metadata DB
- Serving media from origin instead of CDN