Where to build your warehouse: the difference between the mean, median and mode

Jacob Unna
FullStackAI
Published in
6 min readOct 24, 2019

At school most of us study three measures of centrality for dataset: the mean, the median and the mode. Given the dataset 1, 1, 4, 12, 20:

  • The mean is (1+1+4+12+20)/6 = 7.6;
  • The median is the middle — in this case, third — element, which is 4;
  • The mode is the most frequently occurring element, which is 1.

When I learned about the mean, it needed no further explanation. As a measure of the centre of a dataset, it felt intuitive.

The motivations behind the median and the mode were more hazy. I understood them well enough for year end tests, but I wasn’t convinced of their utility in the real world. The trouble with the median is that if I change that 20 to 2,000, the median wouldn’t even care. This felt most unsatisfactory. The mode is even worse. It doesn’t tell you anything about the dataset as a whole — it just obsesses over one element that happens to occur more than the others.

In this article, we will see why placing blind faith in the mean is misguided. We will scratch below the surface of what a centrality measure is, and understand why the correct choice to describe your data may be the mean, the median, the mode, or something else altogether.

Building a warehouse

Photo by Marcin Jozwiak on Unsplash

Imagine you’re tasked with choosing where to build a warehouse for a chain of stores. Each store needs one truck of goods delivering per day, and the truck must return to the warehouse before visiting another store.

Let’s simplify things and work in one dimension. Suppose our stores are positioned as follows:

Our store locations

One idea is to take the mean of all the store locations. The mean location comes out as 1.6:

Placing the warehouse in the mean store location may work?

Imagine that the cost of driving a truck n miles is £n, i.e. the relationship between miles and cost is linear. If we add up the distance from this location to each shop, it turns out it will cost us £9.67 per day if the warehouse is in the mean location. Can we do better? It turns out we can. The following diagram shows the warehouse in the median location:

It turns out placing the warehouse in the median store location reduces our transportation costs.

In this case, the cost comes out as £7.90 per day — and that is the best we can do. The median, therefore, is the measure of choice when your costs are linear.

So what is the mean for?

Suppose one day the CEO declares that from now on, all transportation should be done by aircraft instead of truck. The cost of flying an airplane n miles is not £n anymore but £. In this case, the costs are:

  • Using mean location: £19.79
  • Using median location: £25.59

We see that now the mean performs better, and in fact, when transporting by aircraft, the mean is the best we can do. The mean, therefore, is the measure of choice when your costs rise with the square of the distance.

What if the cost per mile is neither £n nor £? In that case neither the mean nor the median would be appropriate, and a more sophisticated approach would be required.

Optimal warehouse location when n miles costs £n³.
Optimal warehouse location when n miles costs £exp(n).
Optimal warehouse location where n miles costs £cos(n). In this case, the “cost” is actually negative.

Finally, what about the mode? The mode helps us when 0 miles costs £0 and n miles cost £1 for any n>0. When is this the case? Suppose we are building a cell phone mast. For all our customers, either the mast reaches them, or it doesn’t. In this case, we would place it in the modal location of our customers.

Cost functions

What we have done in the previous section is matched each of the mean, median and mode against a so-called cost function:

  • Mean cost function: f(x) = x²
  • Median cost function: f(x) = x
  • Mode cost function: f(x) = 0 if x = 0; 1 if x ≠ 0

In each of these cases, the chosen measure (mean, median or mode) is the best we can possibly do if the cost of x miles is £f(x). If the cost of x miles is something else, none of these measures will do.

A great thing about seeing these measures through the lens of cost functions is that it sheds light on the standard deviation formula.

Standard deviation formula

This is often understood as a generic measure for how “spaced out” data are, but we can now see it in a new light. The standard deviation is the square root of our total cost if the cost function is f(x) = x². Or put otherwise: if travelling n miles costs us £, our total cost will be £σ².

It’s worth taking a moment to reflect on this. Whenever you use the mean and/or standard deviation, you are implicitly assuming that in some sense your “cost of transportation” is proportional to the distance squared.

Example: exam results

On an exam, the mean result among all candidates was 70%, but the median was only 50%. You score 60%. What should you make of this?

  • If intelligence is proportional to your grade, the median is the relevant measure. Good job.
  • If intelligence is proportional to your grade squared, the mean is the relevant measure. Too bad.

Which of these two is typically the case? That is a complicated question and beyond the scope of this article. But the point is, it’s worth pondering this before deciding whether to use the mean, median or some other measurement. Are the questions all the same difficulty, or do they get more difficult? If they get more difficult, do they increase linearly? Are you trying to get an objective measure of intelligence against the entire population, or is your objective more narrow? These will all help build up a model of how intelligence and the grade are linked.

  • Suppose that intelligence is proportional to the your grade for the most part, but for some perverse reason that the very best candidates were bound to get the first 20% of the questions wrong. Then neither the mean nor the median are helpful.

Conclusion

Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading. — Charles Wheelan

There is something attractive about the way the mean is affected by a change in each and every value in the dataset, and indeed for many real world phenomena the cost does in some sense increase with the square of the distance. But this isn’t always the case, and it’s important to know which to use when.

--

--

Jacob Unna
FullStackAI

Software Engineer @ Deloitte Analytics & Cognitive