Mean vs Median vs Mode

Suryanarayanan
3 min readSep 29, 2021

--

Mean, Median and Mode all are used to measure central tendency of a given dataset. You may have heard that terms or you know a bit about them and not sure when to use which or if you have these questions, What is central tendency? Why there are three ways? What are mean, median, mode ? what makes them different? when to use which? Answers to all those questions is what we are going to discuss in this post.

What is central tendency?

Central tendency can be defined in the following ways:

  • particular value in a dataset around which most of our data points are present.
  • particular value in a dataset which is frequently appearing in a given dataset.

In the below image, the blue dotted line represents the central tendency of a dataset.

Source: https://www.statology.org/wp-content/uploads/2021/01/skew5.png

Why measure of central tendency is so important?.

  • Because it helps us to find the representative value of data.
  • gives different perspective of data using different measures.
  • helpful in data imputation in Data Science Projects

Mean

Mean is the summation of all the data points divided by total no of data points. Following data is the amount of cigs per day consumed by 5 persons. 2,2,2,1,4 To find the mean of the data we have to do the following:

  • Sum up all the values = 2+2+2+1+4= 11
  • Divide the summation by total no of data points = 11/5 = 2.2

Now add another data to our dataset. 2+2+2+1+4+15. Now if we calculate mean we will get, 26/6 = 4.3. The mean value is nearly doubled right. This is because the new person’s data added to our dataset is higher than normal value(this is one outlier) and caused the mean to shift.

From the above we can conclude that mean is sensitive to outliers to overcome this issue we calculate median.. So before using mean we should make sure our dataset has no outliers.

when to use:

  • if your dataset has less or no outliers
  • when you have quantitative data.

Median

Median is also a way of measuring central tendency. Median gives us the middle data of a given dataset. We will use the same data we used to learn mean. It does the following to measure central tendency:

  • Arrange the given data in ascending order: 1+2+2+2+4+15
  • If our total no of samples is odd, Then we have to take ((n+1)/2)th term as mode.
  • If our no of samples is even, then we have to take (((n/2)th term+(n/2+1)th term)/2)th term as mode. If this formula confuses you just remember we have to take two middle values and divide them by 2.
  • In our case total no of data points is even. So we will take the two middle values they are 2 & 2 and average them. we get ((2+2)/2) = 2. Second term is the mode which is nothing but 2.

You can see that even though we have the outlier the median is still around 2 and not much deviating. This is the advantage of median it gives us the middle most data and less sensitive to outliers.

When to use:

  • when your dataset has more outliers.
  • when you have qualitative data
  • when you have ordinal data.

Mode

Mode measures the central tendency as the most frequent value in a given dataset. To calculate mode:

  • we just have to select the element is which is appearing frequently.

Lets calculate mode for the same dataset. 2,2,2,1,4,95

The most frequent value in the above dataset is 2. so the mode of the dataset is 2. A dataset can have more than one mode and even cannot have a mode. If a dataset has two mode values then it is bimodal.

When to use:

  • Mean can be applied to any data type.
  • This is the only measure of central tendency which makes sense when dealing with nominal data.

Hope you liked this blog, i would love to hear your comments.

--

--