What is Tail Latency?

Definition

The slowest requests (p99, p99.9), which matter far more than the average for user experience.

How it works

Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load shedding. Optimise the tail, not the mean.

Common questions

The slowest requests (p99, p99.9), which matter far more than the average for user experience.

How does Tail Latency work?

Averages hide pain: if p50 is 20ms but p99 is 2s, one user in a hundred has a terrible time — and a page that fans out to 100 services almost always hits someone's tail. Causes: GC pauses, queueing, retries, cold caches. Tame it with timeouts, hedged requests, and load…