Definition
The slowest requests (p99, p99.9), which matter far more than the average for user experience.
How it works
Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load shedding. Optimise the tail, not the mean.
Common questions
What is Tail Latency?
The slowest requests (p99, p99.9), which matter far more than the average for user experience.
How does Tail Latency work?
Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load…
What is Tail Latency used for in system design?
Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load…