Balancing algorithms
"Spread the load" is vague. Spread it how?
Open the interactive version → diagrams, practice & moreThe problem
"Spread the load" is vague. Spread it how?
The idea
Different algorithms suit different workloads: round-robin, least-connections, weighted, consistent-hashing.
How it works
Round-robin is fair only when requests are uniform. Least-connections adapts when durations vary widely, but a single global load view is hard at scale. Power-of-two-choices (P2C) — sample two random backends, pick the less loaded — gets near-optimal balance with almost no coordination, which is why modern proxies default to it. Consistent hashing routes a key to the same backend for cache affinity; the "bounded loads" variant caps any node's share so a hot key can't swamp it.
The tradeoff
Smarter algorithms need fresher backend state, itself a distributed problem — a stale least-connections view sends a herd to one "idle" node. Consistent hashing trades perfect balance for stability and locality. Match algorithm to workload: uniform→round-robin, variable→P2C/least-conn, cache-affine→consistent hashing.
In the wild
CDNs and distributed caches lean on consistent hashing so a node joining/leaving moves minimal keys.
Interview deep dive
Flow
- Classify the workload: uniform, variable-duration, or cache-affine.
- Pick round-robin, P2C/least-connections, or consistent hashing.
- Feed the balancer fresh backend load/health signals.
- Watch per-node load; cap shares if one key runs hot.
Watch for
- Stale load views make least-connections stampede one node.
- Round-robin ignores request cost — bad for mixed durations.
- Plain consistent hashing still lets a hot key overload its node (use bounded loads).
Interviewer trap
Tie the algorithm to the workload, and name P2C as the cheap near-optimal default.