Collaborative Whiteboard (Miro)
Realtime multi-user canvas of shapes/strokes that converges.
Open the interactive version → diagrams, practice & moreRequirements
Functional
- Concurrent shape edits
- Live cursors
- Infinite canvas
- Persistence
Non-functional
- Low latency
- Convergence
Scale
Many users per board
The approach
CRDT for spatial objects (each shape an independent, commutative object → fewer conflicts than text); per-board server sequences/broadcasts; viewport-based loading for infinite canvas.
Key components
Client ⇄ WS ⇄ per-board server · object store · presence
Numbers that matter
- ~30–60ms is the usable round-trip budget for stroke latency — above that, the pen feels disconnected; below 30ms it's indistinguishable from local.
- A 10,000-object board with per-object CRDT state clocks in at ~5–15 MB uncompressed; delta snapshots on reconnect must be diffed, not full-synced.
- Miro reports ~99.9% of collaborative sessions involve fewer than 50 concurrent users per board — design for the burst, not the median.
- WebSocket frames for cursor positions are typically 10–50 bytes each; at 50 users × 60 fps that's ~150 KB/s per board — conflate to ~15 fps to keep it manageable.
Senior deep-dive
CRDTs are the right model for spatial objects — shapes are independent commutative entities, not a linear text stream, so merging concurrent moves/resizes has no ambiguity.
The infinite canvas is the ops problem nobody talks about: loading all objects for a 10k-element board on join kills bandwidth — viewport-based loading with a spatial index (R-tree / quadtree) gates what you sync.
Sequence (broadcast) order matters more than consistency: a per-board sequencer ensures clients apply deltas in the same order, so pointer positions and shape layering converge without rollback.
CRDT vs OT: not ideology, it's object topology
Text editors use OT because characters have a strict linear order and concurrent inserts need positional transformation. Whiteboard objects are independent — moving shape A and resizing shape B never conflict unless they share a z-order, so OR-Set CRDTs per object handle concurrency without a sequencer. The caveat: z-index (layering) is a shared ordered list and still needs OT or a sequencer for that one attribute.
Viewport gating: the scalability lever nobody demos
Naive implementations sync the entire board state on join — fine at 100 objects, catastrophic at 50,000. Production systems use a spatial index (R-tree or quadtree) server-side to send only objects intersecting the client's viewport plus a margin. As the user pans, the server streams in newly visible objects and prunes unseen ones. Object IDs + bounding boxes are the subscription unit, not the full object payload.
Per-board sequencer: cheap coordination that buys correctness
Even with CRDTs, clients need to agree on event ordering for layering (z-order), undo stacks, and cursor presence. A lightweight per-board sequencer — a single process or a durable log partition — stamps each delta with a monotonic sequence number. Clients buffer out-of-order deltas and apply in-order. This is not a single point of failure if the sequencer is a leader in a small Raft group or backed by a durable log like Kafka.
Reconnection diffing: where bugs actually live
On reconnect, the client has a last-seen sequence number and a local CRDT state. The server must produce a diff from that sequence to HEAD without replaying gigabytes of history. Periodic snapshots (e.g., every 1,000 ops) let the server reconstruct state at any checkpoint. Without snapshots, you replay from the beginning — fine in demos, ruinous in production boards with years of history.
Presence and cursors: fire-and-forget with TTL
Cursor positions are ephemeral, high-frequency, and loss-tolerant — exactly wrong for a durable log. Route them through a separate ephemeral pub/sub channel (a dedicated Redis pub/sub topic per board or a WebSocket fan-out sidecar) distinct from the durable delta log. TTL-based expiry (2–5 seconds) cleans up stale cursors without explicit disconnect signals, which are unreliable over flaky connections.
What breaks at scale
Hot boards with hundreds of concurrent editors expose two failure modes: the sequencer becomes a throughput bottleneck (solution: shard by region of the canvas, accept that cross-region z-order is eventually consistent), and the spatial index update rate exceeds the indexer's write capacity when objects are dragged at high frequency. In practice, conflate object-move events to ~15 fps server-side before indexing, and batch spatial index writes to avoid per-event overhead.
In production
Miro uses a per-board presence service with a sequencing server and CRDT-per-object model; Figma went a step further with a custom OT engine over WebSockets with a server as the single source of truth for ordering. The real engineering challenge is reconnection: when a client goes offline for 30 seconds and rejoins, diffing its CRDT state against the server's snapshot to produce a minimal delta — without replaying the entire history — is where most production whiteboard bugs live.
Common mistakes
- Loading the entire board at once
- Locking the whole canvas
- Treating it like a text editor (over-engineering OT)