System Design Library

API Gateway

A single entry point that authenticates, routes, rate-limits and observes all API traffic.

Open the interactive version → diagrams, practice & more

Requirements

Functional

Auth/authz
Routing to services
Rate limiting
Request/response transform
Logging/metrics

Non-functional

Low added latency
Highly available
Horizontally scalable

Scale

All ingress traffic

The approach

Stateless gateway fleet behind an LB; validates tokens (JWT/opaque + cache), applies limits, routes to backends by path/host, emits telemetry. Offload TLS here.

Key components

LB → gateway fleet → microservices · auth cache · limiter

Numbers that matter

A JWT RS256 signature verification costs ~0.5-2ms of CPU per request uncached; with a local LRU cache (hit rate >95%), the effective cost drops to <0.05ms per request
A single Nginx/Envoy gateway node handles ~50K-100K req/s at typical API payload sizes — horizontal scaling is trivial since gateways are stateless
TLS handshake overhead: a new TLS 1.3 connection costs ~1 RTT (~10-50ms depending on client proximity); session resumption via TLS session tickets drops this to near-zero for returning clients
Route configuration propagation via push-based xDS (Envoy's control plane API) reaches all nodes in < 1 second in a 1000-node fleet — polling-based approaches take 10-30× longer

Senior deep-dive

The gateway is a horizontal trust boundary, not a router — every request's authenticity must be verified here because backends must never re-do it; push auth, rate limiting, and TLS termination here or you'll re-implement them in every service.

JWT verification is CPU-bound: cache the parsed + verified token result (keyed by token hash) with a TTL matching token expiry — a local in-process LRU cache cuts JWT verification CPU by 90%+ at high RPS.

Config propagation lag is the hidden availability risk — when you push a new route or kill an auth key, it must propagate to all gateway nodes within seconds; use a push-based control plane (not polling) with an in-process fallback snapshot.

Auth: the two models and when to use each

Opaque tokens (random strings verified against a token store) require a network call on every request — manageable if the token store is Redis with sub-millisecond latency. JWT (self-contained, cryptographically signed) enables local verification with no network call, but revocation is hard — you can't invalidate a JWT before its expiry without a blocklist (which is effectively the opaque token model again). Use JWTs for short-lived access tokens (15-60 min expiry) and maintain a revocation blocklist only for logout/compromised-account events.

Stateless gateway: what that actually means

A stateless gateway holds no session state per-request — each request is fully self-describing (token, headers, body). This enables horizontal scaling without sticky sessions or session affinity. The trap: caches are not session state but still need invalidation. Your JWT verification cache, route table snapshot, and rate limit counters (if local) are all state that must be synchronized or replaced. Design the gateway so that losing any one node loses zero user-visible state — only cached data that can be re-derived.

Routing: path-based, host-based, and the versioning trap

Path prefix routing (`/v1/users → user-service`) is simple but bakes API versioning into URL paths. Header-based routing (`API-Version: 2` header) is cleaner but requires clients to set headers. Host-based routing (`v2.api.example.com`) lets you version at the DNS level. The trap: multiple versioning schemes coexist in mature APIs, and the gateway must handle all of them. Treat routing config as code (stored in git, deployed via CI) not as admin UI config — UI-configured routing is a change management nightmare.

Observability: the gateway is your best telemetry point

Every request passes through the gateway, making it the ideal place to emit structured access logs, metrics, and trace spans. Emit a trace ID on every request (generate if not present, propagate if present). Log: latency, status code, upstream service, route, client ID. Never log request bodies at the gateway — they contain PII and secrets; log only headers and metadata. The gateway's latency histogram (P50/P95/P99) is the first place ops looks during an incident.

Circuit breaking and graceful degradation

When a backend service starts returning 5xx or timing out, the gateway should open a circuit breaker (stop sending requests for N seconds) rather than piling up connections to a failing service. This is exponential backoff at the infrastructure level — without it, a slow downstream service causes the gateway's connection pool to exhaust, cascading to healthy backends. Configure per-upstream timeout budgets (not just connection timeouts — also request timeouts with a deadline header propagated to the backend).

What breaks at scale

Config propagation becomes your deployment bottleneck: with 1000 gateway nodes, pushing a new route table, cert rotation, or rate limit change must be atomic from the user's perspective but is eventually consistent across nodes. A 30-second propagation window means some nodes serve old config — design around this with backward-compatible config changes (add before remove, never delete without deprecation period). The second failure mode: the gateway becomes a latency amplifier — every added plugin (auth + rate limit + transform + log + trace) adds overhead; profile your middleware chain and set a hard budget (e.g. < 5ms gateway overhead at P99).

In production

Kong, AWS API Gateway, and Apigee are the dominant commercial gateways; internally, companies like Uber (Heimdall), Netflix (Zuul → Envoy), and Lyft (the original Envoy author) built their own. Envoy with Istio as a sidecar is the current default for service-mesh-based approaches. The real engineering challenge is the plugin/middleware ordering problem: auth → rate limiting → routing sounds obvious, but when you add logging, request transformation, circuit breaking, and A/B routing, the order of middleware execution becomes a source of subtle security bugs — auth must always run before any business logic or caching middleware.

Common mistakes

Stuffing business logic into the gateway
Per-request auth DB lookups
No circuit breaking to failing backends

Related System Design Library

Part of System Design Library on SystemLore — system design interview prep with 148 deep topics, interactive diagrams, and a practice game. Practice this one →