Google Docs (collaborative editor)
Many users editing one document simultaneously, seeing each other's keystrokes live and converging.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Concurrent editing
- Live cursors/presence
- History/undo
- Offline edits
Non-functional
- Low-latency sync
- Convergence (all clients agree)
Scale
Dozens of editors/doc
The approach
Operational Transformation (OT) or CRDTs to merge concurrent edits deterministically. A per-doc server (or DO) sequences ops and broadcasts; clients apply transformed ops to converge.
Key components
Client ⇄ WS ⇄ per-doc collaboration server · op log/store · presence
Numbers that matter
- Google Docs reportedly handles ~100 simultaneous editors per document before performance degrades — the bottleneck is operation broadcast fan-out, not OT computation.
- A typical OT transformation is O(n) in the number of concurrent in-flight operations — at 10 concurrent users sending 10 ops/second each, that's 100 transforms/second, well within a single server's capacity.
- CRDT metadata overhead for a sequence CRDT (like LSEQ or RGA) can be 10–100× the size of the raw text — a 10KB document may have 100KB–1MB of CRDT state, which constrains storage and sync approaches.
- WebSocket message round-trips between client and the per-doc server must stay under ~100ms for edits to feel responsive — beyond this users perceive lag between keystrokes and seeing their own text appear.
Senior deep-dive
Concurrent edit convergence is the only genuinely hard problem — if two users insert a character at the same position, naive last-write-wins corrupts the document.
OT requires a central server to sequence operations; CRDTs remove the central sequencer but carry higher metadata overhead and trickier text semantics. The per-document server (or Durable Object) is the sequencing point — it must be stateful, sticky per document, and able to replay its operation log to rebuild document state after a crash.
OT vs CRDT: the actual decision criteria
Choose OT when you need a central server anyway (simpler metadata, smaller wire format, mature libraries like ShareDB/Quill). Choose CRDTs when you need peer-to-peer or offline-first semantics (e.g. local-first apps, offline mobile). OT's weakness is that transformation functions are notoriously hard to get right for complex document types (tables, embedded objects) — bugs here cause silent document corruption. CRDTs eliminate transformation but introduce tombstone bloat and GC complexity.
The per-document server is your consistency boundary
In an OT system, the per-document server (or Durable Object) is the only thing that assigns global sequence numbers — if two users submit op #5 simultaneously, the server picks an order and transforms accordingly. This server must be single-writer per document (no horizontal scale for writes). For very large documents (think Wikipedia articles), this can be a bottleneck; the mitigation is document splitting (sections as independent CRDT/OT units) so multiple server instances can handle different sections.
Presence and cursor broadcasting
Cursor positions and selections are ephemeral presence data, not document operations — treat them separately from OT/CRDT to avoid polluting the operation log. Broadcast cursor updates via a separate pub/sub channel per document with no persistence. Cursor positions must also be transformed against incoming ops (if user A's cursor is at position 50 and user B inserts 5 chars at position 10, A's cursor shifts to 55) — this is a simplified but real transform.
Persistence: operation log vs document snapshots
Never store only the latest document state — store the full operation log so you can replay, debug, and implement undo. Snapshots (materialized document state at sequence N) act as checkpoints so replay doesn't start from op #1. A good pattern: snapshot every 1,000 ops, keep the full log for 30 days, compress older logs. Undo is implemented by applying an inverse operation, not by replaying history — this is O(1) vs O(n).
Offline and reconnect: the hardest case
A client offline for T minutes has local ops based on server state at sequence S. On reconnect, it must rebase its local ops against all server ops from S to S+N (N ops happened while offline). With OT this is O(local_ops × N) transformations. The practical limit is O(hundreds) — beyond that, the client should discard local ops and show a conflict UI rather than attempting a transformation that may fail silently. CRDTs handle this more gracefully but still have merge complexity for deletions.
What breaks at scale
A viral shared document with 500 simultaneous editors overwhelms the broadcast fan-out: each op must be sent to 499 other WebSocket connections. The per-doc server becomes CPU-bound on serialization and network I/O. Fix with hierarchical fan-out (relay servers per region each holding a subset of connections) and op batching (coalesce 10ms of ops into one message). The second failure: large documents with long history cause slow initial load — mitigate with lazy loading (load the visible viewport's content first, load history only on request).
In production
Google Docs uses Operational Transformation over a central per-document server — each client sends ops to the server, which assigns a global sequence number and broadcasts transformed ops to all other clients. Figma switched from OT to a CRDT-like approach for its multiplayer canvas, finding CRDTs simpler for spatial objects where operations naturally commute. Cloudflare Durable Objects have become the go-to infrastructure for per-document stateful servers (one DO per doc, sticky WebSocket connections, in-memory state + durable KV backing). The real engineering challenge is handling offline edits and reconnection: a client that goes offline for 30 minutes accumulates ops that must be rebased against potentially thousands of server-side ops on reconnect — this is where OT's transformation complexity is most painful.
Common mistakes
- Last-write-wins on the whole doc
- No per-doc sequencing (divergence)
- Ignoring offline reconciliation