System Design Library

Dropbox / File Sync

Sync files across a user's devices reliably and efficiently, including huge files.

Open the interactive version → diagrams, practice & more

Requirements

Functional

  • Upload/download
  • Sync across devices
  • Versioning
  • Sharing
  • Conflict handling

Non-functional

  • Bandwidth-efficient
  • Durable
  • Handles multi-GB files

Scale

Petabytes; large files

The approach

Chunk files (e.g. 4MB), content-address each chunk (hash), dedupe, and only transfer changed chunks. Metadata service tracks file→chunk lists & versions; chunks live in object storage.

Key components

Client (chunker/differ) → metadata service + chunk store (object storage) · notification/sync service

Numbers that matter

Senior deep-dive

Content-addressing is the design — hashing chunks by content means identical chunks (same file copied, common library files) are stored exactly once, making deduplication automatic and transfers delta-efficient.

The chunk transfer protocol is the product: only chunks absent from the server are uploaded; a file rename or move with no content change uploads zero bytes — this is only possible because metadata (file→chunk list) and data (chunks) are decoupled.

Conflict resolution is the hardest UX problem: last-write-wins (by timestamp) loses data; Dropbox's approach is to create a conflict copy rather than silently overwrite — operationally safe but leaves cleanup to the user.

Chunking and content addressing: the foundation

Files are split into fixed-size chunks (4MB default, but Dropbox uses variable-size Rabin fingerprinting to split at content-defined boundaries — this means a shift in file content doesn't invalidate all subsequent chunks). Each chunk's SHA-256 hash is its storage key. Upload means: compute hashes locally, send the list to the server, server replies with which hashes it doesn't have, client uploads only those. This protocol minimizes bandwidth regardless of file size.

Metadata vs. block service: two separate concerns

The metadata service maps (user, path) → (file version, ordered list of chunk hashes). It's a small, transactional store — must support atomic version updates. The block service (object store) maps hash → bytes. It's append-only and immutable — chunks are never modified, only added or garbage collected. Decoupling these means the metadata service can be a Postgres cluster (for ACID) while the block store scales to exabytes independently. Most sync bugs live in the metadata service's version reconciliation logic.

Delta sync: the client-side algorithm

When a file changes, the client re-chunks it and computes hashes. It then diffs the new hash list against the old hash list stored in local state — only new or changed hashes need uploading. For large files with small edits, this is transformative: editing a 1GB video's metadata (no pixel changes) uploads ~0 bytes if it's in a separate chunk. The trap: Rabin fingerprinting adds CPU cost on the client — for frequently changed small files, the hashing overhead dominates; use fixed-size chunks there and content-defined only for large files.

Conflict handling: don't silently destroy data

Two users (or one user on two devices) edit the same file offline and both sync. Last-write-wins destroys one version silently — unacceptable for documents. Dropbox's model: detect conflicts at sync time (the server version changed since the client's last known version), keep both, rename the loser to `filename (Conflicted Copy from Device on Date).ext`, and present both to the user. This is safe but leaves cleanup work to the user. Vector clocks (like Google Drive's revision history) are more sophisticated but operationally heavier.

Notification channel: how clients know something changed

Polling for changes (every 30 seconds) creates unbounded latency for collaborative workflows. Dropbox uses a long-polling notification server (originally called 'notify.dropbox.com') — the client holds an open HTTP connection; when a change occurs, the server responds to wake the client, which then fetches the delta from the metadata service. This is not a sync channel — it's just a wake signal. The actual change data comes from a separate authenticated API call. This pattern avoids the complexity of stateful push while eliminating polling latency.

What breaks at scale

Garbage collection is the hardest operational problem: when a file is deleted or all its chunk references are removed, the block service still holds those bytes. A reference-counting GC on a multi-exabyte store with billions of chunks is slow and must run continuously without blocking reads. Dropbox uses a mark-and-sweep approach with a separate GC pipeline. The second failure mode: hot files — a shared team folder edited simultaneously by 50 people creates 50 concurrent version-bump requests to the metadata service for the same file path; serialize with optimistic locking + retry or a per-file coordinator.

In production

Dropbox originally built on top of Amazon S3 for chunk storage and later migrated to their own petabyte-scale object store (Magic Pocket) to reduce costs. Box and Google Drive use similar chunking approaches but with different dedup scopes (Google Drive deduplicates within a user's Drive; Dropbox deduplicates globally across all users for identical chunks). The real engineering challenge is the metadata service: the chunk manifest (file → ordered list of chunk hashes) must support transactional version updates — if a sync fails mid-upload, the manifest must not reflect partial state, requiring two-phase commit or optimistic locking on the file version.

Common mistakes

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →