What is Latency?

Definition

The time a single operation takes end to end — how long one request waits for its response.

How it works

Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are different axes — improving one can hurt the other.

Common questions

The time a single operation takes end to end — how long one request waits for its response.

How does Latency work?

Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are…

What is Latency used for in system design?