Definition
Capping how many requests a client can make in a window to protect a service from overload or abuse.
How it works
Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the gateway or load balancer.
Learn more on SystemLore
Common questions
What is Rate Limiting?
Capping how many requests a client can make in a window to protect a service from overload or abuse.
How does Rate Limiting work?
Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the…
What is Rate Limiting used for in system design?
Implemented with token-bucket (allows bursts) or sliding-window counters, usually in a shared store (Redis) so the limit holds across many servers. Return HTTP 429 with a Retry-After header. Protects against scrapers, runaway clients and thundering herds; the limit lives at the…