System Design Library

Collaborative Spreadsheet

Realtime multi-user spreadsheet with formulas that recalculate.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Concurrent cell edits
Formula evaluation
Live cursors
Recalc on change

Non-functional

Low latency
Convergence
Fast recalc

Scale

Many editors; large sheets

The approach

CRDT/OT for concurrent cell edits (like Docs); a dependency graph (cell → dependents) drives incremental recalculation — only affected cells recompute; server sequences and broadcasts.

Key components

CRDT cell sync · dependency-graph recalc engine · WS

Numbers that matter

Google Sheets supports up to 10 million cells per spreadsheet; formula recalculation on a large model with 50k interdependent formulas takes 100–500ms on their servers, requiring incremental dirty-cell propagation rather than full recompute
Dependency graph fanout: a single source cell (e.g. a tax rate constant) referenced by 5,000 formulas triggers 5,000 recalculations on change — without topological batching, this cascades into seconds of compute
Typical collaborative editing sessions have 2–5 concurrent editors with edit events every 50–200ms per active user — the ops rate is modest but the formula recalc cost is not
WebSocket message size for a cell op is ~100–200 bytes (cell reference + value + version vector); at 5 editors × 10 ops/sec that is <10KB/s per document — trivially small bandwidth

Senior deep-dive

Formula dependency graphs, not CRDT conflict resolution, are the unsolved hard problem in collaborative spreadsheets — concurrent edits to interdependent cells require incremental recalculation across a directed acyclic graph, and cycles crash it.

Cell edits are structurally simpler than text edits for CRDT because each cell is an independent LWW-register (last-write-wins on a cell key); the hard concurrency is not 'did A and B both type into the same cell' but 'did A insert a row while B was editing a formula that referenced a row number'.

Row/column insertion breaks absolute references, requiring either relative reference semantics everywhere or a structural transform layer that updates all affected formulas atomically — this is the real OT/CRDT complexity in spreadsheets.

Cell-as-LWW-register is the CRDT foundation

Unlike text documents (where character-level CRDTs are needed), spreadsheet cells are independent registers: each cell `(row, col)` holds a single value, and concurrent writes to the same cell can be resolved by last-write-wins (wall-clock or logical timestamp). Two users simultaneously setting A1 to different values — one wins, and that's acceptable behavior in collaborative spreadsheets (unlike text, where both keystrokes must be preserved). This dramatically simplifies the CRDT model. The complexity surfaces only at structural operations: insert/delete row/column, which invalidate relative references across the entire sheet.

Structural ops (insert/delete row) are the real OT problem

When user A inserts a row above row 5 while user B is editing a formula `=SUM(A5:A10)`, the formula must become `=SUM(A6:A11)`. This is a transform that must be applied to all formulas whose ranges intersect the insertion point. The OT transformation function for `InsertRow(5)` against `EditCell(A5, formula)` must shift the formula's range reference. Named ranges and structured references (`Table1[Column]`) avoid this problem by decoupling from positional addresses — a compelling reason to push users toward named ranges in collaborative contexts. Without transforms, concurrent structural ops produce broken formulas silently.

Dependency graph drives incremental recalculation

Formula recalculation must be incremental, not full-sheet. The system maintains a directed acyclic graph (DAG) of cell dependencies: an edge from A1 to B1 means B1's formula references A1. On edit of A1, the engine performs a topological sort of all reachable dependents and recalculates in topological order. This avoids redundant recalculation (if B1 and C1 both depend on A1, and D1 depends on both B1 and C1, A1's change triggers B1→D1 and C1→D1 computed once, not twice). Cycle detection must run before committing any formula edit — a DFS looking for back edges, run client-side before sending the op.

Server sequences ops; client applies optimistically

The architecture follows the OT server-as-sequencer model: clients apply ops locally for instant feedback (optimistic), send the op to the server, which assigns a global sequence number and transforms it against any concurrent ops. The server then broadcasts the transformed op and its sequence number to all clients. Clients that receive a sequence number lower than their local state must transform their local pending ops against the server op — this is the classic OT buffer. The formula recalculation result is computed server-side authoritatively and sent to all clients to ensure consistency; clients don't trust their local formula eval for collaboration.

Volatile functions break the dependency graph contract

Functions like `=NOW()`, `=RAND()`, and `=TODAY()` return different values on every evaluation — they have no stable dependencies and must be recalculated on every sheet recalculation pass, not just when a dependency changes. In a collaborative system, this means all clients must receive the server-computed value for volatile cells to stay in sync; local evaluation would diverge instantly. The system marks volatile cells explicitly in the dependency graph and re-evaluates them on a cadence (e.g. every 1 second for NOW() in real-time mode). Allowing too many volatile cells in a large sheet degrades recalculation from incremental to effectively full-sheet.

What breaks at scale

The production failure mode is formula fan-out storms: a user edits a single 'master rate' cell that 10,000 formula cells depend on, triggering a recalculation cascade that takes 5 seconds and blocks all other edits during that time. The mitigation is recalculation batching (coalesce multiple dirty-cell events into one DAG traversal per tick) and async recalculation (send speculative 'calculating...' to clients, compute in background, broadcast results). The second failure mode is concurrent structural-op conflicts that produce formula corruption: two users simultaneously insert rows at the same position, and the transform function wasn't commutative. Conflict-free structural ops require operation commutativity testing — a non-obvious correctness requirement that most teams skip until a user reports corrupted data.

In production

Google Sheets uses an OT-based architecture where a central server sequences all operations and computes authoritative formula results; the client applies a local optimistic update and reconciles on the server's ack. Notion's database (table) view is simpler — cells are independent properties without formulas, so LWW-per-cell CRDTs work cleanly. Airtable avoids the formula dependency problem by limiting formula complexity and recalculating server-side synchronously on every write. The real challenge nobody talks about: circular reference detection at edit time — when a user types `=A1` in a cell that A1's formula already depends on, the system must detect the cycle before committing the op and reject it with an error, not crash the recalculation engine.

Common mistakes

Recomputing the entire sheet on any edit
No cycle detection
Last-write-wins on cells

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →