Academy · Scaling & Load Balancing

Load balancers

You have ten app servers. How does a user's request reach a healthy, not-overloaded one?

Open the interactive version → diagrams, practice & more

The problem

You have ten app servers. How does a user's request reach a healthy, not-overloaded one?

The idea

A load balancer sits in front and distributes incoming requests across your servers.

How it works

The senior detail is the layers and the lifecycle, not just the algorithm. Globally, DNS or anycast routes a user to the nearest region; inside a region an L4 LB (IP/port, fast, connection-oriented) fronts an L7 LB that parses HTTP, routes by path/header, retries, and terminates TLS. Health checks — active probes plus passive ejection on errors — drop dead backends, and connection draining lets in-flight requests finish before a backend is pulled for a deploy. The LB itself must be redundant: an HA pair sharing a floating/virtual IP, or anycast across many.

The tradeoff

L7 is smarter (content routing, retries, mTLS) but parses every request — more CPU and a wider blast radius if it falls over. Tight health checks detect failure fast but can eject a briefly-slow backend and shove its load onto the rest, cascading. And when a backend dies its connections re-establish elsewhere all at once — a thundering herd the survivors must absorb, so you size for N-1.

In the wild

NGINX, HAProxy, Envoy, AWS ALB/NLB. Every big site runs layers of them.

Interview deep dive

Flow

  1. Client resolves to a regional VIP via DNS or anycast.
  2. An L4 LB forwards the connection into the L7 pool.
  3. L7 terminates TLS and picks a backend by least-connections.
  4. On deploy the backend drains: new requests skip it, in-flight finish.

Watch for

Interviewer trap

Name the L4-vs-L7 split, the health-check and draining lifecycle, and how the LB stays redundant.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →