Comparisons

Latency vs Throughput

Latency is how long one request takes; throughput is how many requests you handle per unit time. Optimizing one can hurt the other.

1 min read·7 sections

Open the interactive version → diagrams, practice & more

Overview

Latency is a duration (this request took 80 ms); throughput is a rate (we serve 50k requests/second). They are related but distinct: batching raises throughput but adds latency; a system can have great average latency and poor tail latency (p99) that ruins user experience. You design for both, and you measure latency at percentiles, not averages.

Latency vs Throughput: key differences

	Latency	Throughput
Measures	Volume per unit time	Time per request
Unit	req/s, MB/s	ms, seconds
Improve by	Parallelism, batching, more nodes	Caching, fewer hops, faster code
Tension	Batching helps it	Batching can hurt it
Watch	Saturation/backpressure	Tail (p95/p99), not average

When to use Latency

You care about total capacity — pipelines, batch jobs, ingestion — where finishing the most work matters most.

When to use Throughput

You care about responsiveness — interactive apps, APIs — where each request must feel fast, especially at the tail.

Verdict

They are not either/or — define targets for both (e.g. p99 latency < 200 ms at 30k req/s). Beware optimizations like batching that buy throughput by sacrificing latency.

Common questions

What is the difference between latency and throughput?

Latency is how long a single operation takes; throughput is how many operations complete per second. A system can have high throughput but poor (tail) latency, and vice versa.

Why measure latency at p99 instead of average?

Averages hide the slow tail. If 1% of requests take 2s, the average can still look fine while many users have a bad experience — p95/p99 capture that.

Part of Comparisons on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Build this one →