Nicolas Vandeput
May 26 · 11 min read

The article below is an extract from my book Data Science for Supply Chain Forecast, available here. You can find my other articles here:

“I shall not today attempt further to define this kind of material (…), and perhaps I could never succeed in intelligibly doing so. But I know it when I see it.”
Potter Stewart

In 1964, Potter Stewart was a United States Supreme Court Justice. He wasn’t discussing outliers but whether the movie The Lovers was or wasn’t obscene.

As you work on forecast you will notice that your dataset will have outliers. And even though I know it when I see it might be the only practical definition, these outliers pose a real threat to supply chains. These high (or low) points will result in over-reactions in your forecast or in your safety stocks, ultimately resulting in (at best) manual corrections or (at worst) dead stocks, losses and a nasty bullwhip effect. Actually, when you look at blogs, books, articles or software on forecast, the question of outlier detection is often eluded. This is a pity. Outlier detection is serious business.

These outliers pop out all the time in modern supply chains. They are mostly due to two main reasons:

Mistakes & errors These are obvious outliers. If you spot such kind of errors or encoding mistakes, it calls for process improvement in order to prevent these from happening again.
Exceptional demand Even though some demand observations are real, it does not mean they are not exceptional and shouldn’t be cleaned or smoothed. This kind of exceptional sales are actually not so uncommon in supply chains. Think about promotions, marketing, strange customer behaviours or destocking. Typically, you might not want to take into account for your forecast the exceptional -80% sales you did last year to get rid of an old nearly-obsolete inventory.

If you can spot outliers and smooth them out, you will make a better forecast. I have seen numerous examples where the forecast error was reduced by a couple of percents just thanks to outlier cleaning. Actually, the bigger the dataset, the more important it is to automate this detection and cleaning. Let’s see how we can do this.

In this article, we will discuss three and a half ideas to spot these outliers and put them back to a reasonable level.

Idea #1 — Winsorization

As we said, an outlier is an exceptionally high or low value. Based on this simple definition, a first idea to detect outliers would be to simply cut down the top x highest and lowest points of the dataset. Let’s see how this would work on the two (dummy) datasets on the tables below.

This first technique will simply decrease the top/down x% values of our historical demand down to the limit of the xth percentile.

the xth percentile is a value below which x% of the observations in a group will fall. For example, 99% of the demand observations for a product will be lower than its 99th percentile.

This technique of simply shrinking the demand down to a certain percentile is called winsorization. The name comes from Charles P. Winsor, a statistician from the first half of the XXth century.

If we look at the 1st and 99th percentile on our two dummy datasets above, this is what we obtain:

On this table we see that in both datasets all the low values would be increased up to 4.4. You can see on the figure below that this cuts a part of our dataset. The high values would be decreased down to 16.6 on the dataset without outliers (see figure 10.1) and down to 70.9 for the dataset with an outlier (see figure 10.2).

You might have noticed that the winsorization didn’t give us round results such as 4 or 5, but instead we got this 4.4. Actually, as we don’t have an exact value that cuts the dataset by 99%, we do a linear approximation based on the two closest values. This is how we got these numbers instead of round numbers.

So, are we happy with this technique?
No we’re not.
- We have spotted fake outliers on a dataset without outliers.
- On the dataset with outliers, we haven’t sufficiently reduced the outlier (it went from 100 to 70.9).

Of course, one could simply propose to decrease the higher limit of the winsorization from 99% to 95% to further reduce the outlier on dataset #2, but unfortunately this would also have an effect on dataset #1. This is not a good solution. One could also propose to remove this lower limit so that we don’t increase our demand to 4.4. But, what if we have periods with missing demand? Shouldn’t we clean these as well if any?

Do It Yourself
Excel you can easily get the different percentiles of a range of cells in Excel by using the formula =PERCENTILE.INC(range,limit). Of course you’ll have to use this formula once for the upper limit (with a value around 0.95–0.99) and once for the lower limit (with a value around 0.01–0.05).
Python We can easily winsorize our dataset in Python thanks to NumPy. We can compute the different percentiles of an array thanks to the np.percentile(array,percentile) function.

import numpy as np
higher_limit = np.percentile(array, 99)
lower_limit = np.percentile(array, 1)

Note that the percentile function takes a percentile expressed as a value between 0 and 100 and not a ratio (i.e. a value between 0 and 1) like in Excel.

We can then simply cut the array to these lower and higher limits thanks to the function np.clip(array,min,max):

array = np.clip(array,a_min=lower_limit,a_max=higher_limit)

Idea #2 Standard deviation

As we just saw, winsorization wasn’t the perfect way to exclude outliers as it would take out high and low values of a dataset even if they weren’t exceptional per see.

Another approach would be to look at the demand variation around the historical average and exclude the values that are exceptionally far from this average.
Let’s define the demand standard deviation as:

where n is the amount of demand observations we have.

If we assume that our data is normally distributed around the historical mean we can compute the probability for the demand to be between two thresholds. The exact mathematics involved here are beyond the scope of the article, and unfortunately more often than not the assumption of normality is not strictly respected. These two thresholds will be centered on the demand average (μ) with a spread of x times the standard deviation (σ) in both directions. The more chaotic the demand (i.e.σ is big), the wider the thresholds.

For example, we have a 98% probability to be in the range: demand average +/- 2.33 times the standard deviation (as in figure above). So that if we wanted to remove the top 1% of both high and low values, we would restrict the demand to μ +/-2.33 σ.

Note that this means we have a 99% probability to be lower than μ + 2.33 σ. And a 99% probability to be higher than μ — 2.33 σ.

If we applied this to our example datasets (see the first two tables), we would get these limits:

Let’s see how these new normal limits behave compared to the winsorization limits.

This is already much better than the results we got with winsorization:
- On the dataset without outliers (see figure 10.4), we don’t change any demand observation (perfect! — just as we want).
- On the dataset with an outlier, we don’t change the low-demand points but only the actual outlier (see figure 10.5).

Still, even though we reduce the outlier to a more manageable amount (47.9) than with the winsorization (70.9), it might not be enough yet.

So, are we happy now?
Not quite yet.

As you might remember, we assumed the error to be around the historical mean. This is fine for a product with a flat demand, but the actual limitation will arise when you have a product with a trend or a seasonality. For example, on the seasonal table below, the highest (or lowest) points are no longer the outliers you want to remove.

You can see how winsorization and normalization work on this seasonal demand on the figure below.

It simply doesn’t make sense: both techniques flag the season peaks as outliers and they skip the real outlier which is Y2 M11.

We will solve this with our next technique.

Do It Yourself
Excel You can compute the standard deviation of a range of cells thanks to the formula =STDEV.P(range). As always, you can compute the mean thanks to =AVERAGE(range). Once you have these two, you can compute the higher and lower limits thanks to =NORM.INV(percentile, mean, stdev). Typically, you will want the high percentile to be around 0.99 and the low one around 0.01.
Python You can calculate the standard deviation via np.std(array) for an array-like (e.g. a list, a DataFrame etc.) or for a DataFrame directly via the method .std(). So that if you have a DataFrame df you can simply type:

m = df.mean()
s = df.std()

We will then once again use the SciPy library to compute the normal probabilities. We’ll then use the .clip method on our DataFrame to cap it to our limits.

from scipy.stats import norm
#Print the probabilities of each demand observation
print(norm.cdf(df.values, m, s).round(2))
limit_high = norm.ppf(0.99,m,s)
limit_low = norm.ppf(0.01,m,s)
df = df.clip(lower=limit_low, upper=limit_high)

Idea #3 Error standard deviation

The second idea we had to flag outliers was to compare each observation against the mean of the demand. We saw that it didn’t make sense if we had a trend or a seasonality as the difference between an observation and the historical mean wasn’t relevant.

Well, let’s go back to the definition of an outlier: an outlier is a value that you didn’t expect. Just like the Spanish Inquisition in Monty Python shows. That is to say that an outlier is a value far away form your prediction (i.e. your forecast). To spot outliers, we will therefore analyze the forecast error and see which periods are exceptionally wrong. To do that, we’ll use the standard deviation approach that we used previously.

Let’s take back the seasonal example we made above. We will compare the historical demand to a simple (but seasonal) forecast we have for it.

If we computed the error we have for such a forecast (which is simply an average of the historical demand), we would obtain a mean error of 0.4 and a standard deviation of 3.2 (this is of course heavily impacted by the error we have for Y2 M11). If we took a 99% confidence interval around this mean, we would shrink forecast errors into -0.4 +/- 2.33 x 3.2 = -8,7. You can see on the figure below how these limits around the forecast perfectly fit the seasonal demand.

We can now correct our outlier from Y2 M11. The demand was 19 but the forecast was 5 for this period. The maximum acceptable value is then 5 + 7 = 12. This means that we can replace the outlier of Y2 M11 (19) by this new value (12).

With this smarter detection method — analyzing the forecast error deviation instead of simply the demand variation around the mean — we will be able to flag outliers much more precisely and reduce them back to a plausible amount. As you can see on the figure above, normalization and winsorization couldn’t achieve any meaningful results for this seasonal demand.

The fine tuning of this method how many standard deviations should you take as a limit? is — of course — left to you to experiment…

Do It Yourself
Python If you have a pandas DataFrame with one column as the forecast and another one as the demand (the typical output from our exponential smoothing models), we can use this code:

df[“Error”] = df[“Forecast”] — df[“Demand”]
m = df[“Error”].mean()
s = df[“Error”].std()
from scipy.stats import norm
limit_high = norm.ppf(0.99,m,s)+df[“Forecast”]
limit_low = norm.ppf(0.01,m,s)+df[“Forecast”]
df[“Updated”] = df[“Demand”].clip(lower=limit_low,upper=limit_high)

Go the extra mile!

If you think back about our idea to analyze the forecast error and make a threshold of acceptable errors, we actually still have a minor issue. The threshold we compute is based on the dataset including the outliers. This outlier drives the error variation upward so that the acceptable threshold is biased and overestimated. To correct this, one could actually shrink the outlier not to the threshold calculated based on the original demand dataset but to a limit calculated on a dataset without this specific outlier. Here’s the recipe:

  1. Populate a first forecast against the historical demand.
  2. Compute the error, the error mean and the error standard deviation
  3. Compute the lower & upper acceptable thresholds (based on the error mean and standard deviation).
  4. Identify outliers just as explained previously.
  5. Re-compute the error mean and standard deviation but excluding the outliers.
  6. Update the lower & upper acceptable thresholds based on these new values.
  7. Update the outlier values based on the new threshold.

If we take back our seasonal example from above, we initially had a forecast error mean of 0.4 and a standard deviation of 3.22. If we remove the point Y2 M11, we obtain an error mean of -0.1 and a standard deviation of 2.3. That means that now the thresholds are -5.3,5.2 around the forecast. Our outlier in Y2 M11 would then be updated to 10 (instead of 12 with our previous technique).

Do It Yourself
We’ll take back our code from our previous idea and add a new step to update the error mean and standard deviation values.

df[“Error”] = df[“Forecast”] — df[“Demand”]
m = df[“Error”].mean()
s = df[“Error”].std()
from scipy.stats import norm
prob = norm.cdf(df[“Error”], m, s)
outliers = (prob > 0.99) | (prob < 0.01)
m2 = df[“Error”][~outliers].mean()
s2 = df[“Error”][~outliers].std()
limit_high = norm.ppf(0.99,m2,s2)+df[“Forecast”]
limit_low = norm.ppf(0.01,m2,s2)+df[“Forecast”]
df[“Updated”] = df[“Demand”].clip(lower=limit_low,upper=limit_high)

About the author

Nicolas Vandeput is a supply chain data scientist specialized in demand forecasting & inventory optimization.
In 2016, he founded
SupChains (, his consultancy company; two years later, he co-founded SKU Science (, a smart online platform for supply chain management.
If you are interested in forecast and machine learning, you can buy his book Data Science for Supply Chain Forecast

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Nicolas Vandeput

Written by

Author of “Data Science for Supply Chain Forecast”

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade