System Design Library

YouTube / Netflix (video)

Upload, store, transcode and stream video to millions globally on any device/network.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Upload
  • Transcode to many bitrates
  • Adaptive streaming
  • Recommendations
  • Views/likes

Non-functional

  • Smooth global playback
  • Massive storage & egress

Scale

Exabytes; billions of views

The approach

Upload → object store + queue → transcoding workers produce HLS/DASH renditions (multiple bitrates) → served via CDN near users. Metadata in DB + cache; never stream from origin.

Key components

Upload → object store + queue → transcode workers → CDN · metadata DB + cache · recsys

Numbers that matter

Senior deep-dive

Adaptive bitrate is the core UX win — the player switches rendition to match bandwidth, so playback never stalls.

That means pre-transcoding many versions (resolutions × bitrates) — heavy, async work done on upload, never on read.

Storage and egress dominate the bill — serve everything from a CDN near users; Netflix even pushes caches inside ISPs (Open Connect).

Adaptive bitrate streaming (HLS/DASH)

The video is split into short segments (2–10s) at multiple bitrates, described by a manifest. The player reads the manifest and picks each segment's bitrate from current bandwidth — dropping quality to avoid a stall, climbing back when the network recovers. This client-side switching is the whole playback UX.

Transcoding is heavy, async, and parallel

On upload a transcode farm produces every rendition (resolutions × codecs × bitrates) — CPU/GPU-intensive work that must be off the upload path (queue + workers). It is embarrassingly parallel: split the video, transcode segments concurrently, reassemble. Never make the uploader or viewer wait on it.

The CDN is the product at scale

With billions of views, egress bandwidth is the dominant cost and the latency lever. Pre-position popular content on CDN edges near users; never stream from origin. Netflix's Open Connect goes further — appliances inside ISPs — because the cheapest byte is the one served closest to the viewer.

Storage tiering for the long tail

A tiny fraction of videos get most views. Keep hot content on fast storage + CDN; tier cold content to cheaper storage with fewer renditions. You don't need every rendition of an unwatched video pre-made — generate cold renditions lazily on first demand.

Upload, metadata, and counts

Upload is a resumable, chunked transfer to object storage; metadata (title, owner) lives in a DB + cache. View and like counts are high-volume and approximate — aggregate them asynchronously rather than incrementing a row per view.

What breaks at scale

The hard parts are transcode throughput/cost, CDN egress economics, and storage for the rendition explosion. Tier storage by popularity, generate cold renditions lazily, push caching as close to viewers as possible. Recommendations and search are separate systems on top — the spine is transcode + CDN.

In production

YouTube and Netflix both run ingest → object storage → async transcode farm → HLS/DASH segments → global CDN. Netflix's Open Connect puts its own caching boxes inside ISP networks so popular titles are served meters from the viewer. The interesting engineering is the transcode pipeline and CDN economics, not the upload.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →