Glossary

Rate Limiting

Capping how many requests a client can make in a window to protect a service from overload or abuse.

1 min read·5 sections
Open the interactive version → diagrams, practice & more

Definition

Capping how many requests a client can make in a window to protect a service from overload or abuse.

How it works

Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the gateway or load balancer.

Learn more on SystemLore

Common questions

What is Rate Limiting?

Capping how many requests a client can make in a window to protect a service from overload or abuse.

How does Rate Limiting work?

Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the…

What is Rate Limiting used for in system design?

Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the…

Part of Glossary on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Browse the glossary and "X vs Y" comparisons, or build this one →