Definition
The time a single operation takes end to end — how long one request waits for its response.
How it works
Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are different axes — improving one can hurt the other.
Common questions
What is Latency?
The time a single operation takes end to end — how long one request waits for its response.
How does Latency work?
Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are…
What is Latency used for in system design?
Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are…