Academy · Scaling & Load Balancing

Balancing algorithms

"Spread the load" is vague. Spread it how?

Open the interactive version → diagrams, practice & more

The problem

"Spread the load" is vague. Spread it how?

The idea

Different algorithms suit different workloads: round-robin, least-connections, weighted, consistent-hashing.

How it works

Round-robin is fair only when requests are uniform. Least-connections adapts when durations vary widely, but a single global load view is hard at scale. Power-of-two-choices (P2C) — sample two random backends, pick the less loaded — gets near-optimal balance with almost no coordination, which is why modern proxies default to it. Consistent hashing routes a key to the same backend for cache affinity; the "bounded loads" variant caps any node's share so a hot key can't swamp it.

The tradeoff

Smarter algorithms need fresher backend state, itself a distributed problem — a stale least-connections view sends a herd to one "idle" node. Consistent hashing trades perfect balance for stability and locality. Match algorithm to workload: uniform→round-robin, variable→P2C/least-conn, cache-affine→consistent hashing.

In the wild

CDNs and distributed caches lean on consistent hashing so a node joining/leaving moves minimal keys.

Interview deep dive

Flow

  1. Classify the workload: uniform, variable-duration, or cache-affine.
  2. Pick round-robin, P2C/least-connections, or consistent hashing.
  3. Feed the balancer fresh backend load/health signals.
  4. Watch per-node load; cap shares if one key runs hot.

Watch for

Interviewer trap

Tie the algorithm to the workload, and name P2C as the cheap near-optimal default.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →