Academy · Real-World System Designs

Design video streaming (YouTube)

Upload, store, transcode, and stream video to millions globally on every device and network.

Open the interactive version → diagrams, practice & more

The problem

Upload, store, transcode, and stream video to millions globally on every device and network.

The idea

Async transcoding pipeline + object storage + CDN delivery; metadata kept separate from blobs.

How it works

Upload (resumable, chunked) lands in object storage and enqueues a transcode job; workers build an ABR ladder — the same video at many bitrates/resolutions, segmented into ~2–10s chunks (HLS/DASH) so the player adapts to bandwidth mid-stream. Chunks serve from a CDN with an ISP-embedded edge tier; the origin is never in the hot path. Metadata (title, owner, view count) lives in a DB + cache, separate from the blobs. View counts aggregate async through a queue, never a synchronous per-view increment.

The tradeoff

Transcoding is heavy and async, so a video isn't instantly watchable — you trade immediacy for delivery efficiency. *Storage and egress dominate cost* (one source becomes many renditions × many chunks), which is why hot content is pushed to ISP edges and cold content tiers to cheaper storage. The metadata/blob split lets each scale independently — small hot rows vs huge cold objects.

In the wild

YouTube, Netflix (which caches video inside ISP networks).

Interview deep dive

Flow

  1. Resumable chunked upload lands in object storage.
  2. Transcode workers build an ABR ladder of segmented renditions.
  3. Chunks serve from CDN → ISP edge; origin stays cold.
  4. Player adapts bitrate per chunk to current bandwidth.

Watch for

Interviewer trap

Name ABR chunking and that egress/storage, not compute, is the dominant cost.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →