System Design Library

Collaborative Whiteboard (Miro)

Realtime multi-user canvas of shapes/strokes that converges.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Concurrent shape edits
Live cursors
Infinite canvas
Persistence

Non-functional

Low latency
Convergence

Scale

Many users per board

The approach

CRDT for spatial objects (each shape an independent, commutative object → fewer conflicts than text); per-board server sequences/broadcasts; viewport-based loading for infinite canvas.

Key components

Client ⇄ WS ⇄ per-board server · object store · presence

Numbers that matter

~30–60ms is the usable round-trip budget for stroke latency — above that, the pen feels disconnected; below 30ms it's indistinguishable from local.
A 10,000-object board with per-object CRDT state clocks in at ~5–15 MB uncompressed; delta snapshots on reconnect must be diffed, not full-synced.
Miro reports ~99.9% of collaborative sessions involve fewer than 50 concurrent users per board — design for the burst, not the median.
WebSocket frames for cursor positions are typically 10–50 bytes each; at 50 users × 60 fps that's ~150 KB/s per board — conflate to ~15 fps to keep it manageable.

Senior deep-dive

CRDTs are the right model for spatial objects — shapes are independent commutative entities, not a linear text stream, so merging concurrent moves/resizes has no ambiguity.

The infinite canvas is the ops problem nobody talks about: loading all objects for a 10k-element board on join kills bandwidth — viewport-based loading with a spatial index (R-tree / quadtree) gates what you sync.

Sequence (broadcast) order matters more than consistency: a per-board sequencer ensures clients apply deltas in the same order, so pointer positions and shape layering converge without rollback.

CRDT vs OT: not ideology, it's object topology

Text editors use OT because characters have a strict linear order and concurrent inserts need positional transformation. Whiteboard objects are independent — moving shape A and resizing shape B never conflict unless they share a z-order, so OR-Set CRDTs per object handle concurrency without a sequencer. The caveat: z-index (layering) is a shared ordered list and still needs OT or a sequencer for that one attribute.

Viewport gating: the scalability lever nobody demos

Naive implementations sync the entire board state on join — fine at 100 objects, catastrophic at 50,000. Production systems use a spatial index (R-tree or quadtree) server-side to send only objects intersecting the client's viewport plus a margin. As the user pans, the server streams in newly visible objects and prunes unseen ones. Object IDs + bounding boxes are the subscription unit, not the full object payload.

Per-board sequencer: cheap coordination that buys correctness

Even with CRDTs, clients need to agree on event ordering for layering (z-order), undo stacks, and cursor presence. A lightweight per-board sequencer — a single process or a durable log partition — stamps each delta with a monotonic sequence number. Clients buffer out-of-order deltas and apply in-order. This is not a single point of failure if the sequencer is a leader in a small Raft group or backed by a durable log like Kafka.

Reconnection diffing: where bugs actually live

On reconnect, the client has a last-seen sequence number and a local CRDT state. The server must produce a diff from that sequence to HEAD without replaying gigabytes of history. Periodic snapshots (e.g., every 1,000 ops) let the server reconstruct state at any checkpoint. Without snapshots, you replay from the beginning — fine in demos, ruinous in production boards with years of history.

Presence and cursors: fire-and-forget with TTL

Cursor positions are ephemeral, high-frequency, and loss-tolerant — exactly wrong for a durable log. Route them through a separate ephemeral pub/sub channel (a dedicated Redis pub/sub topic per board or a WebSocket fan-out sidecar) distinct from the durable delta log. TTL-based expiry (2–5 seconds) cleans up stale cursors without explicit disconnect signals, which are unreliable over flaky connections.

What breaks at scale

Hot boards with hundreds of concurrent editors expose two failure modes: the sequencer becomes a throughput bottleneck (solution: shard by region of the canvas, accept that cross-region z-order is eventually consistent), and the spatial index update rate exceeds the indexer's write capacity when objects are dragged at high frequency. In practice, conflate object-move events to ~15 fps server-side before indexing, and batch spatial index writes to avoid per-event overhead.

In production

Miro uses a per-board presence service with a sequencing server and CRDT-per-object model; Figma went a step further with a custom OT engine over WebSockets with a server as the single source of truth for ordering. The real engineering challenge is reconnection: when a client goes offline for 30 seconds and rejoins, diffing its CRDT state against the server's snapshot to produce a minimal delta — without replaying the entire history — is where most production whiteboard bugs live.

Common mistakes

Loading the entire board at once
Locking the whole canvas
Treating it like a text editor (over-engineering OT)

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →