Overview
Latency is a duration (this request took 80 ms); throughput is a rate (we serve 50k requests/second). They are related but distinct: batching raises throughput but adds latency; a system can have great average latency and poor tail latency (p99) that ruins user experience. You design for both, and you measure latency at percentiles, not averages.
Latency vs Throughput: key differences
| Latency | Throughput | |
|---|---|---|
| Measures | Volume per unit time | Time per request |
| Unit | req/s, MB/s | ms, seconds |
| Improve by | Parallelism, batching, more nodes | Caching, fewer hops, faster code |
| Tension | Batching helps it | Batching can hurt it |
| Watch | Saturation/backpressure | Tail (p95/p99), not average |
When to use Latency
You care about total capacity — pipelines, batch jobs, ingestion — where finishing the most work matters most.
When to use Throughput
You care about responsiveness — interactive apps, APIs — where each request must feel fast, especially at the tail.
Verdict
They are not either/or — define targets for both (e.g. p99 latency < 200 ms at 30k req/s). Beware optimizations like batching that buy throughput by sacrificing latency.
Common questions
What is the difference between latency and throughput?
Latency is how long a single operation takes; throughput is how many operations complete per second. A system can have high throughput but poor (tail) latency, and vice versa.
Why measure latency at p99 instead of average?
Averages hide the slow tail. If 1% of requests take 2s, the average can still look fine while many users have a bad experience — p95/p99 capture that.