Your SLAs based on Kanban Lead Times might be dangerous

5 min readApr 14, 2021

It is common practice to base Service Level Agreements on your current Kanban Lead Time distribution. The idea is simple: Let’s say we know that 90% of our lead times are 16 days or less. Thus we are confident that we can base an SLA on that data which says: We will deliver within 16 days nine out of ten times. It is rare that you can sufficiently forecast due date delivery with a 90 percent certainty. Kanban helps you with that, as described e.g. here.

How tall is the average American?

There is one statistical flaw in this approach though. The lead time data we have is a sample. It is drawn from a (hypothetical) population of possible lead times in your Kanban system. It’s like when you randomly choose 100 male people from New York and you measure how tall they are — the mean of their height will be somewhat close to the mean height of all American men. But it will not be exactly the same.

In statistics that’s an old story: From your sample you want to compute a confidence interval for the population. If the mean height of the hundred male New Yorkers is e.g. 1.79 m then we can be 90% sure that the average male American is, let’s say, between 1.76 and 1.83 tall. It’s unlikely that it is exactly 1.79 for the whole population of male Americans.

Experiment: 90th percentile Lead Time in sample vs. population

In order to see how far apart the sample value and population value might be, I tried to come close to a population value with two data sets:

300 randomly selected values from a Weibull distribution (shape 1.5)
82 lead time values taken from a Kanban team in my company (Q1 2021).

A simple approach to estimate population values from a sample is the bootstrap method. I included the samples in the appendix, if you want to play around with them. The sample data looks like this:

Bootstrapping!

Now let’s do the bootstrapping. For 100.000 times we draw a sample of the same size (i.e. 82 or 300 draws, with replacement) and note the 90th percentile. This will result in a distribution of 100.000 values for the 90th percentile based on samples and we can assume that the “real” population value of the 90th percentile is somewhere within this distribution.

The values resulting from 100.000 iterations are pretty stable, here they are:

(It is, by the way, by pure chance that the values of the two samples are somehow similar.)

The charts might illustrate it better: The estimated population value for the 90th percentile is a probability distribution.

A high-penalty SLA based on sample percentiles: Dangerous

What we see in the table above: The 90th percentile value of the sample and the estimated population value (with a 90% certainty) differ. For the Kanban Team there is a 3 day difference, for the Weibull data a 1 day difference.

That means we have a real risk of breaking the SLA based on the value of 16 days. If there is a serious penalty for breaking the SLA this might be painful. In light of the estimated population value the value of 16 days (based on the sample alone) has a 50/50 chance of being to optimistic.

So, bottom line: Know your data, know the statistical pitfalls, and then make good decisions. Which is good advice in many cases, I guess.

Appendix

Values “Kanban Team, Q1 2021”:
1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 12, 12, 12, 12, 12, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 19, 21, 22, 22, 32, 48

Values “Weibull, shape 1.5”:
6, 1, 5, 10, 9, 6, 5, 2, 3, 16, 3, 7, 5, 4, 4, 3, 12, 3, 14, 11, 10, 0, 5, 6, 8, 4, 4, 12, 4, 7, 5, 19, 14, 8, 18, 19, 8, 4, 6, 4, 3, 17, 12, 5, 8, 1, 6, 3, 7, 6, 2, 5, 5, 5, 10, 13, 5, 1, 9, 7, 3, 5, 19, 6, 29, 3, 6, 7, 7, 3, 9, 1, 12, 19, 2, 9, 3, 1, 13, 1, 8, 3, 6, 13, 31, 12, 10, 9, 3, 6, 9, 16, 19, 20, 20, 3, 3, 2, 15, 8, 22, 7, 3, 6, 9, 2, 5, 7, 15, 6, 22, 14, 7, 4, 4, 13, 8, 15, 16, 10, 4, 11, 9, 5, 15, 12, 7, 10, 12, 6, 9, 9, 7, 3, 2, 9, 27, 17, 7, 9, 4, 3, 11, 12, 18, 3, 5, 9, 7, 13, 11, 6, 7, 30, 17, 13, 1, 4, 10, 4, 10, 8, 1, 9, 7, 10, 1, 5, 6, 2, 11, 12, 7, 0, 5, 7, 11, 1, 16, 5, 10, 19, 17, 0, 5, 4, 6, 7, 17, 5, 6, 2, 9, 6, 1, 8, 10, 12, 13, 18, 10, 24, 9, 2, 2, 9, 5, 13, 7, 12, 8, 5, 3, 4, 10, 8, 10, 1, 9, 9, 14, 3, 14, 3, 13, 12, 8, 7, 0, 6, 3, 4, 29, 8, 6, 12, 11, 6, 2, 8, 22, 21, 5, 12, 12, 3, 2, 6, 11, 1, 5, 14, 10, 3, 14, 12, 4, 17, 2, 14, 11, 7, 11, 11, 6, 8, 6, 9, 3, 5, 3, 8, 12, 3, 7, 5, 6, 10, 3, 5, 2, 2, 11, 25, 16, 5, 3, 10, 2, 27, 11, 3, 3, 10, 4, 12, 3, 12, 4, 16

If you have worked on estimating population values for flow metrics, let’s get in touch. I’m curious.