Dropbox / File Sync
Sync files across a user's devices reliably and efficiently, including huge files.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Upload/download
- Sync across devices
- Versioning
- Sharing
- Conflict handling
Non-functional
- Bandwidth-efficient
- Durable
- Handles multi-GB files
Scale
Petabytes; large files
The approach
Chunk files (e.g. 4MB), content-address each chunk (hash), dedupe, and only transfer changed chunks. Metadata service tracks file→chunk lists & versions; chunks live in object storage.
Key components
Client (chunker/differ) → metadata service + chunk store (object storage) · notification/sync service
Numbers that matter
- 4MB chunk size is the Dropbox default — balances metadata overhead (smaller chunks = more entries in the chunk manifest) against transfer efficiency (larger chunks = more re-upload on small edits)
- A typical office document changes < 5% of its bytes between saves; chunking means only 1-2 out of 25 chunks (4MB each in a 100MB file) need re-uploading per save
- Dropbox stores exabytes of user data; their deduplication ratio across all users is estimated at ~30-40% storage saved from cross-user identical chunk dedup alone
- The sync client polls the metadata service or uses long-polling / push notifications (Dropbox uses a push channel via their notification server) to detect remote changes — polling interval was ~30s early on, now effectively real-time via push
Senior deep-dive
Content-addressing is the design — hashing chunks by content means identical chunks (same file copied, common library files) are stored exactly once, making deduplication automatic and transfers delta-efficient.
The chunk transfer protocol is the product: only chunks absent from the server are uploaded; a file rename or move with no content change uploads zero bytes — this is only possible because metadata (file→chunk list) and data (chunks) are decoupled.
Conflict resolution is the hardest UX problem: last-write-wins (by timestamp) loses data; Dropbox's approach is to create a conflict copy rather than silently overwrite — operationally safe but leaves cleanup to the user.
Chunking and content addressing: the foundation
Files are split into fixed-size chunks (4MB default, but Dropbox uses variable-size Rabin fingerprinting to split at content-defined boundaries — this means a shift in file content doesn't invalidate all subsequent chunks). Each chunk's SHA-256 hash is its storage key. Upload means: compute hashes locally, send the list to the server, server replies with which hashes it doesn't have, client uploads only those. This protocol minimizes bandwidth regardless of file size.
Metadata vs. block service: two separate concerns
The metadata service maps (user, path) → (file version, ordered list of chunk hashes). It's a small, transactional store — must support atomic version updates. The block service (object store) maps hash → bytes. It's append-only and immutable — chunks are never modified, only added or garbage collected. Decoupling these means the metadata service can be a Postgres cluster (for ACID) while the block store scales to exabytes independently. Most sync bugs live in the metadata service's version reconciliation logic.
Delta sync: the client-side algorithm
When a file changes, the client re-chunks it and computes hashes. It then diffs the new hash list against the old hash list stored in local state — only new or changed hashes need uploading. For large files with small edits, this is transformative: editing a 1GB video's metadata (no pixel changes) uploads ~0 bytes if it's in a separate chunk. The trap: Rabin fingerprinting adds CPU cost on the client — for frequently changed small files, the hashing overhead dominates; use fixed-size chunks there and content-defined only for large files.
Conflict handling: don't silently destroy data
Two users (or one user on two devices) edit the same file offline and both sync. Last-write-wins destroys one version silently — unacceptable for documents. Dropbox's model: detect conflicts at sync time (the server version changed since the client's last known version), keep both, rename the loser to `filename (Conflicted Copy from Device on Date).ext`, and present both to the user. This is safe but leaves cleanup work to the user. Vector clocks (like Google Drive's revision history) are more sophisticated but operationally heavier.
Notification channel: how clients know something changed
Polling for changes (every 30 seconds) creates unbounded latency for collaborative workflows. Dropbox uses a long-polling notification server (originally called 'notify.dropbox.com') — the client holds an open HTTP connection; when a change occurs, the server responds to wake the client, which then fetches the delta from the metadata service. This is not a sync channel — it's just a wake signal. The actual change data comes from a separate authenticated API call. This pattern avoids the complexity of stateful push while eliminating polling latency.
What breaks at scale
Garbage collection is the hardest operational problem: when a file is deleted or all its chunk references are removed, the block service still holds those bytes. A reference-counting GC on a multi-exabyte store with billions of chunks is slow and must run continuously without blocking reads. Dropbox uses a mark-and-sweep approach with a separate GC pipeline. The second failure mode: hot files — a shared team folder edited simultaneously by 50 people creates 50 concurrent version-bump requests to the metadata service for the same file path; serialize with optimistic locking + retry or a per-file coordinator.
In production
Dropbox originally built on top of Amazon S3 for chunk storage and later migrated to their own petabyte-scale object store (Magic Pocket) to reduce costs. Box and Google Drive use similar chunking approaches but with different dedup scopes (Google Drive deduplicates within a user's Drive; Dropbox deduplicates globally across all users for identical chunks). The real engineering challenge is the metadata service: the chunk manifest (file → ordered list of chunk hashes) must support transactional version updates — if a sync fails mid-upload, the manifest must not reflect partial state, requiring two-phase commit or optimistic locking on the file version.
Common mistakes
- Whole-file uploads on any change
- No dedup (storage blowup)
- Silent last-write-wins on conflicts