Load balancers
You have ten app servers. How does a user's request reach a healthy, not-overloaded one?
Open the interactive version → diagrams, practice & moreThe problem
You have ten app servers. How does a user's request reach a healthy, not-overloaded one?
The idea
A load balancer sits in front and distributes incoming requests across your servers.
How it works
The senior detail is the layers and the lifecycle, not just the algorithm. Globally, DNS or anycast routes a user to the nearest region; inside a region an L4 LB (IP/port, fast, connection-oriented) fronts an L7 LB that parses HTTP, routes by path/header, retries, and terminates TLS. Health checks — active probes plus passive ejection on errors — drop dead backends, and connection draining lets in-flight requests finish before a backend is pulled for a deploy. The LB itself must be redundant: an HA pair sharing a floating/virtual IP, or anycast across many.
The tradeoff
L7 is smarter (content routing, retries, mTLS) but parses every request — more CPU and a wider blast radius if it falls over. Tight health checks detect failure fast but can eject a briefly-slow backend and shove its load onto the rest, cascading. And when a backend dies its connections re-establish elsewhere all at once — a thundering herd the survivors must absorb, so you size for N-1.
In the wild
NGINX, HAProxy, Envoy, AWS ALB/NLB. Every big site runs layers of them.
Interview deep dive
Flow
- Client resolves to a regional VIP via DNS or anycast.
- An L4 LB forwards the connection into the L7 pool.
- L7 terminates TLS and picks a backend by least-connections.
- On deploy the backend drains: new requests skip it, in-flight finish.
Watch for
- Least-connections beats round-robin when request durations vary widely.
- A health check on a shared dependency can eject the whole fleet at once.
- Size for N-1 so one failure never pushes survivors past capacity.
Interviewer trap
Name the L4-vs-L7 split, the health-check and draining lifecycle, and how the LB stays redundant.