When do you use mean vs median?

R. Gupta
Geek Culture
Published in
3 min readNov 7, 2022

Statistics Interview Question: Part 1

We are going to cover some statistics interview questions that are generally asked in data scientists, data analysts, and business analysts interviews. For the data science domain, statistics is involved in each and every phase of the data analysis life cycle, whether it is about the data collection phase, data cleaning phase, data processing phase, data analyzing phase, or driving the solution for the given problem. Statistics is closely involved in every phase of data analysis. Therefore you should have a very clear understanding of some of the important topics of statistics. There is a lot of question which are asked in interviews related to statistics. We will cover some interview questions in this article series. The question is:

When do you use mean vs median?

before answering this question, let’s know what is mean and median are.

https://cdn.wallstreetmojo.com/wp-content/uploads/2021/06/Mean-vs-Median.jpg

Mean Vs Median

In mathematics and statistics, the mean or arithmetic mean of a list of integers is calculated by summing the list’s total and dividing it by the total number of items. The mean is arguably the best way to determine central tendency when examining symmetric distributions. Mean takes all data points into consideration and gives moderate sampling stability.

A median is a value that divides the higher half of a sample, a population, or a probability distribution from the lower half in probability theory and statistics, i.e divides the data into two equal halves. Median does not take the entire data for its calculation.

Say there are nine pupils in the class, and their test results were 2, 4, 5, 7, 8, 10, 12, 13, and 83. The sum of all the scores in this instance is divided by nine to determine the average score (or mean). This equals 144/9, or 16. Though the arithmetic average is 16, it is affected by the extremely high score of 83 when compared to other scores, as can be seen. The majority of the students had below-average marks. As a result, in this instance, the mean is not an accurate reflection of the sample’s central tendency.

On the other hand, the median is the midpoint i.e. ceil of (9/2) i.e. 5th point. Therefore, the median score is 8, which is the more robust measure for the given scores.

When should we use mean and median?

Mean and median are used to describe the entire data using single values. For e.g. we generally ask what the average score scored by students, the average sale made in a month, and the performance of the specific student is better what many percentages of students, etc.

For normal distributions, the mean is good to measure. As we have seen in the above example, the mean is largely affected by outliers (the values which are too large or too small in the dataset), therefore mean alone should not be used when outliers are present in the data. When the given data contains normal distribution or doesn't have outliers, then you should use mean.

Skewed distributions are typically represented by the median. Given that outliers have a significant impact on the mean, the mean is not a reliable instrument. Since it is far more reliable and understandable, the median is better suited for skewed distributions to derive a central tendency.

Even though the median is also a robust measure for normal distributions but we still use the mean as the mean and median are equal for normal distribution. The reason behind this is that the median is more computationally expensive to compute than the mean. For the median, we need to sort data in ascending order, only then it can be calculated.

My suggestion is that always use the mean, and if data is skewed, then provide the median also.

Thanks for reading this article and giving your valuable time. If you liked the article, please clap, comment, and follow on medium to stay tuned for the next articles.

.

--

--

R. Gupta
Geek Culture

I am interested in learning new technology. Interested in Programming, AI, Data Science and Networking. Love to explore new places.