Glossary

Tail Latency

The slowest requests (p99, p99.9), which matter far more than the average for user experience.

1 min read·4 sections
Open the interactive version → diagrams, practice & more

Definition

The slowest requests (p99, p99.9), which matter far more than the average for user experience.

How it works

Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load shedding. Optimise the tail, not the mean.

Common questions

What is Tail Latency?

The slowest requests (p99, p99.9), which matter far more than the average for user experience.

How does Tail Latency work?

Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load…

What is Tail Latency used for in system design?

Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load…

Part of Glossary on SystemLore — system design explained with 148 deep topics, interactive diagrams, and a build-it-yourself game. Browse the glossary and "X vs Y" comparisons, or build this one →