Comparisons

Latency vs Throughput

Latency is how long one request takes; throughput is how many requests you handle per unit time. Optimizing one can hurt the other.

1 min read·7 sections
Open the interactive version → diagrams, practice & more

Overview

Latency is a duration (this request took 80 ms); throughput is a rate (we serve 50k requests/second). They are related but distinct: batching raises throughput but adds latency; a system can have great average latency and poor tail latency (p99) that ruins user experience. You design for both, and you measure latency at percentiles, not averages.

Latency vs Throughput: key differences

LatencyThroughput
MeasuresVolume per unit timeTime per request
Unitreq/s, MB/sms, seconds
Improve byParallelism, batching, more nodesCaching, fewer hops, faster code
TensionBatching helps itBatching can hurt it
WatchSaturation/backpressureTail (p95/p99), not average

When to use Latency

You care about total capacity — pipelines, batch jobs, ingestion — where finishing the most work matters most.

When to use Throughput

You care about responsiveness — interactive apps, APIs — where each request must feel fast, especially at the tail.

Verdict

They are not either/or — define targets for both (e.g. p99 latency < 200 ms at 30k req/s). Beware optimizations like batching that buy throughput by sacrificing latency.

Common questions

What is the difference between latency and throughput?

Latency is how long a single operation takes; throughput is how many operations complete per second. A system can have high throughput but poor (tail) latency, and vice versa.

Why measure latency at p99 instead of average?

Averages hide the slow tail. If 1% of requests take 2s, the average can still look fine while many users have a bad experience — p95/p99 capture that.

Part of Comparisons on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Build this one →