# The Second Law of Latency: Latency distributions are NEVER NORMAL

In the First Law of Latency we remembered that when discussing latency in general we should always be using the language of statistical distributions. Now we are going to dive deeper into the sort of language of statistical distributions that we should be using.

It is commonplace to see “average latency” presented in various forms when discussing the latency of a system. This is wrong thinking that can lead to incorrect decision making because Latency Distributions are NEVER NORMAL.

# The Laws

1. There is no THE LATENCY of the system
2. Latency distributions are NEVER NORMAL
3. DON’T LIE when measuring latency (most tools do… and that’s not ok)
4. DON’T LIE when presenting latency (most presentations do… and that’s not ok)
5. You can’t capacity plan without a LATENCY REQUIREMENT
6. You’re probably not asking for ENOUGH 9s.

Averaging is always the wrong way to aggregate latencies, but everyone does it. Let’s start a grassroots movement to stop making this mistake.

Average is an example of a summary statistic. Summary statistics are lightweight tools that let us describe the gist of the data. Average, in particular, tells us where to find one center of a dataset. This particular center is most useful if the data happened to be normally distributed.

The second law of latency is: Latency distributions are NEVER NORMAL.

“But the average still tells me something!”

Technically average does tell you something, but for non-normally distributed data, the average is just a random number somewhere between the minimum and the maximum observed values. We need to be on stronger footing than this to make decisions.

Let me illustrate: Let’s say we have a sensibly defined latency requirement like ≤100ms 99% of the time.

What if our average latency is 1 second, are we violating our requirement?

The system yielding the latency measurements above (with an average latency of 1 second) is meeting our latency requirement (≤100ms 99% of the time.)

What if our average latency is 10ms: are we meeting our requirement?

The system yielding the latency measurements above (with an average latency of 10ms) is not meeting our latency requirement (≤100ms 99% of the time.)

Thus, a system with a high average latency can be meeting our 99% requirement and a system with a low average latency can failing our 99% requirement. This demonstrates that average is useless for evaluating our 99% requirement.

“But what if my latency requirement was specified as an average?”

You need better requirements. We create requirements so that downstream systems can plan and so that we can guarantee a level of experience for our users. Creating an average latency requirement is useless because you have no idea how meeting that requirement relates to the experience of people who use your system.

When we guarantee the experience of 99% of requests we’re saying it’s okay for up to 1% of requests to be worse than that. If we guarantee an average latency then we have no idea what percent of requests will be worse than our guarantee (because latency is not normally distributed.) Thus, we have no idea how meeting our average latency guarantee impacts our users.

“But I know something is broken when average latency jumps up in a time series.”

No, you don’t. In fact, you tend to know even less when you smear (aggregate) your data out over time. This is even true if you smear the data with 99% quantiles. I will explain why in the Fourth Law of Latency: DON’T LIE when presenting latency.

Remember the second Law of Latency: Latency distributions are NEVER NORMAL. Because of this, averaging latencies is always the wrong choice.

The average latency is just a random number somewhere between the minimum and maximum latency and no sensible decisions can come from that information.

Next week we’ll talk about the third Law of Latency: DON’T LIE when measuring latency and we’ll unmask a real global conspiracy known as “coordinated omission”.