Academy · Consistency & Consensus

Consensus: Raft & Paxos

Many machines must agree on one value/log order, surviving crashes and partitions — provably hard (FLP).

Open the interactive version → diagrams, practice & more

The problem

Many machines must agree on one value/log order, surviving crashes and partitions — provably hard (FLP).

The idea

Consensus algorithms use majority quorums to elect a leader and agree on a replicated log, staying safe through failures.

How it works

Raft (the understandable one): elect a leader; the leader appends entries to a majority before committing; on leader failure a new election runs. Any two majorities overlap, so conflicting decisions are impossible — at most one side of a partition can make progress.

The tradeoff

Strongly consistent and partition-safe, but writes need a round-trip to a majority (latency), and the minority side loses availability.

In the wild

etcd, Consul, CockroachDB, TiDB, Kafka's controller — all built on Raft/Paxos.

Interview deep dive

Flow

Elect a leader via a majority vote (a term).
Leader appends each entry to its log and replicates it.
An entry commits once a majority has durably stored it.
On leader loss a new election runs; overlapping majorities keep it safe.

Watch for

Writes need a majority round-trip — a latency floor you can't avoid.
The minority side of a partition can't make progress (no availability).
Use odd cluster sizes (3/5/7); even sizes waste a node on quorum.

Interviewer trap

Explain safety from quorum overlap: any two majorities share a node, so no split decisions.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →