Glossary

Latency

The time a single operation takes end to end — how long one request waits for its response.

1 min read·4 sections
Open the interactive version → diagrams, practice & more

Definition

The time a single operation takes end to end — how long one request waits for its response.

How it works

Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are different axes — improving one can hurt the other.

Common questions

What is Latency?

The time a single operation takes end to end — how long one request waits for its response.

How does Latency work?

Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are…

What is Latency used for in system design?

Measure it as a distribution (p50/p99), never just an average. Dominated by network round-trips, queueing and the slowest dependency in a fan-out. Cut it with caching, putting data/compute closer to users (CDN, edge), fewer round-trips, and indexing. Latency and throughput are…

Part of Glossary on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Browse the glossary and "X vs Y" comparisons, or build this one →