M and M and M’s

Naveena Benjamin
Women Data Greenhorns
6 min readJul 14, 2018
3 M’s

I prefer to call them M and M and M’s. I just can’t have enough of them. Yes, That’s right ! The “Mean” “ Median” and “Mode”. They always hang out together and they are the best buddies you can find in any mathematical problem you are trying to figure out. The best folks who can give you a lot of insight about your data! So let’s dive in and figure out who they really are. These 3M’s are referred to as the Measures of central tendency or sometimes called as Measures of central location.

The first ‘M’ : MEAN

In statistics, Mean refers to the alias of Arithmetic Mean.
Is this Mean different from Average that we learned in High school?

No.

Average and Arithmetic Mean are synonyms and they both refer to the same thing.

What is Mean?

Consider a Dataset that has many values. Then Mean can be written using a very simple formula.

The second ‘M’ : MEDIAN

Median otherwise means middle. The middlemost value in a data set, when ordered in an ascending pattern, is what we refer to as the Median. If the total number of values in a data set is odd, then

When the total number of values in a data set is even, then

where n is the total number of values in any given data set. Median separates the entire data set into lower half and the upper half.

The third ‘M’ : MODE

Mode can be said as the most occurring or frequent value in a data set.

Let’s move further into examples to make sure we understand how to apply them. Consider a data distribution of candies given to a class of grade 6 students.

We can see here that it is a completely random distribution of candies. Let’s first calculate the average number of candies the students have got. For this, we first find the total number of candies and divide by the number of students.

Count of candies per student

This gives the number of candies every student got and so if we sum them up we get 49. There are 7 students in total. So the Mean of this data or the average number of candies the students have got is 49 / 7 = 7

Next, let’s find out the median of this data. Arranging the candies the kids obtained in ascending order we get 6, 6, 7, 7, 7, 8, 8. That’s a total of seven values which is an odd number. Hence median of odd values as per the formula will be (7+1)/2 = Data in the 4th position. And so the median is 7.

Now let’s investigate further into finding the Mode of this data. For that we will have to categorize the candies by color. The below diagram also shows the count that every color has.

Frequency of candies per color

There are more red candies than any other color. So the highest occurring frequency is 10 . So hence 10 is the mode of this data.

Now let’s move on to some serious stuff. What will happen if we increase or decrease the distribution of candies, how will the 3 M’s change?

When more candies are distributed among the students, then the average number of candies each student has will be more. But if we keep the candies same and increase the students then the mean will fall. So if any student gets a lot more candies or vice versa the mean is greatly affected.

#1: Mean will always change if the data value changes.

Keeping the number of students same, if we give the last student an extra 20 more candies, the median will remain the same. This goes to explain that median is not affected by outliers. This is where median plays a role to give a good insight into salaries of middle class workers, scores what most students got in an exam and so on.

#2: Median is not severely affected by outliers.

The easiest to spot in any histogram is the mode. The highest frequency will always be peaked up and can be easily spotted by mere observation.

#3: Mode is easy to find in an histogram.

3M’s in Data Distribution

Let’s talk about distribution of data. Data when plotted, follows a pattern or curve which can be later classified as normal distribution, uniform distribution, bimodal distribution, positively skewed distribution or negatively skewed distribution. We will discuss just three of them here.

Normal distribution is one in which the values in a data align to form a bell shaped curve. There is zero skewness in a normal distribution as the data is distributed symmetrically. Here the Mean, Median and Mode will all lie at the center of the distribution. The entire data values gets divided into upper half and lower half by these values. GRE , SAT scores are examples where normal distribution comes into play.

Normal distribution

There are couple of cases where the distribution can be skewed. Skewing happens when the Mean is pulled either to the left or right of the median due to very high or very low values.

Positively skewed distribution will have the lesser distribution towards the positive side or better said a long thin tail towards the right. In a right/positively skewed distribution, the Mean lies to the right of Median. Some real life applications include — household income, value/mileage of used cars and so on.

Positively skewed distribution

Negatively skewed distribution will have lesser distribution towards the negative side of the curve or in other words a thin tail towards the left. In a left/negatively skewed distribution, the Mean lies to the left of the Median. Some examples are Death rate of people in developed countries, vocabulary build up in a baby from 0 to 5 years of age, hours that a student puts in towards a fast approaching exam where continuous assessment is absent.

Negatively skewed distribution

Role of Mean and Median on any Data set

When we compare individual value of the data set to the Mean, we can know how far a value lies above or below the Mean. Observing the Mean always may not be a good idea because it is greatly affected by outliers — values that are very high or very low. But if we observe the Median, it can tell us how normally distributed the data set is.
If Mean is closer to the Median, we can say its normally distributed. But if there is a big difference where Mean is either lesser or greater than the Median, we can detect skewness in the distribution. We can also further investigate as to what factors contributed to the skewness in the data.

The Median can give a fair analysis of the distribution, as it is not greatly affected by the outliers and lies in the middle of the entire data set.

Conclusion
With that we have learned about what the 3M’s are, we have placed ourselves on the first steps to basic statistics. Thanks a lot for reading through this article.

--

--