Academy · Scaling & Load Balancing

The stateless app tier

Adding servers only helps if they're interchangeable.

Open the interactive version → diagrams, practice & more

The problem

Adding servers only helps if they're interchangeable.

The idea

A horizontally-scaled, stateless app tier behind a load balancer is the workhorse pattern of the web.

How it works

Identical stateless nodes sit behind the LB; shared state lives in caches/DBs; autoscaling adjusts the count. Because nodes are disposable you get rolling and blue-green deploys with zero downtime and instant rollback. The catch is downstream: each new app node opens DB connections, so scaling the tier out can exhaust the database's connection limit — which is why a pooler (PgBouncer) belongs between them.

The tradeoff

Scaling the app tier is the easy 10% — it just pushes the bottleneck onto whatever it depends on: first the database (reads → replicas/cache, writes → sharding), then the connection pool, then downstream services. Each layer you scale reveals the next; the app tier is rarely the real ceiling.

In the wild

A typical web backend: ALB → fleet of containers → Redis + Postgres.

Interview deep dive

Flow

  1. Run identical stateless nodes behind the load balancer.
  2. Autoscale the count on CPU / request-rate / queue depth.
  3. Deploy by rolling or blue-green; roll back by swapping.
  4. When the DB strains, add replicas, cache, or a pooler.

Watch for

Interviewer trap

After scaling the app tier, immediately name the next bottleneck — usually the database.

Related Academy

Part of Academy on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →