Understanding Data Skewness and it’s effect on Mean, Median

Saurabh Dorle
Omni Data Science
Published in
4 min readMar 1, 2023

Skewness is an essential measure of the shape of a probability distribution, which can significantly impact statistical analysis. It is a measure of the symmetry or asymmetry of a data set around its mean.

In this blog post, we will discuss what skewness is, how to calculate it, and its relation with mean and median.

What is Skewness?

Skewness is the measure of the deviation from the normal distribution in a probability distribution. It tells us whether the distribution is symmetric or not. If a distribution is symmetric, the mean and the median are the same, and the distribution has zero skewness. However, if a distribution is not symmetric, the mean and the median can differ, and the distribution will have non-zero skewness.

Types of Skewness:

There are two types of skewness:

Positive Skewness:

When the tail of the distribution is longer on the right side, the distribution is said to be positively skewed. This indicates that there are more observations on the left side of the distribution, with a few extreme values on the right.

Negative Skewness:

When the tail of the distribution is longer on the left side, the distribution is said to be negatively skewed. This indicates that there are more observations on the right side of the distribution, with a few extreme values on the left.

How to Calculate Skewness?

There are various formulas to calculate skewness, but one of the most commonly used formulas is Pearson’s moment coefficient of skewness. The formula is as follows:

Skewness = 3*(mean — median) / standard deviation

If the skewness value is zero, the distribution is symmetrical. If the skewness value is negative, the distribution is negatively skewed, and if the skewness value is positive, the distribution is positively skewed.

Effect of Skewness on Mean and Median:

As mentioned earlier, skewness tells us about the asymmetry of the distribution. It also has an impact on the mean and the median.

If a distribution is symmetric, the mean and the median are the same. However, if a distribution is skewed, the mean and the median will differ.

In positively skewed distribution, the mean is greater than the median, and in negatively skewed distribution, the mean is less than the median. This is because the mean is affected by the extreme values or outliers, whereas the median is not affected by these values.

Let’s consider an example of positive skewness. Suppose we have a dataset of salaries of employees in a company, as shown below:

$30,000, $40,000, $50,000, $60,000, $70,000, $80,000, $90,000, $100,000, $1,000,000

In this example, the majority of salaries are clustered between $30,000 and $100,000, with only one extreme value of $1,000,000. This dataset is positively skewed since the tail of the distribution is longer on the right side.

If we calculate the mean and median of this dataset, we get:

Mean = ($30,000 + $40,000 + $50,000 + $60,000 + $70,000 + $80,000 + $90,000 + $100,000 + $1,000,000) / 9 = $168,888.88

Median = $70,000

As we can see, the mean is much larger than the median, which is pulled towards the right by the extreme value of $1,000,000. In this case, the median is a better measure of central tendency than the mean.

Now let’s consider an example of negative skewness. Suppose we have a dataset of test scores of students in a class, as shown below:

1, 60, 70, 80, 90, 100, 110, 120, 130

In this example, the majority of scores are clustered towards the higher end of the distribution, with only a few lower scores. This dataset is negatively skewed since the tail of the distribution is longer on the left side. If we calculate the mean and median of this dataset, we get:

Mean = (1 + 60 +70 +80 + 90 +100 + 110 + 120 + 130) / 8 = 84.55

Median = 90

As we can see, the mean is smaller than the median, which is pulled towards the left by the few lower scores. In this case, the median is a better measure of central tendency than the mean.

Conclusion

In conclusion, data skewness is an important concept to understand when analyzing data. Skewed data can have a significant impact on the mean and median, with the mean being more influenced by extreme values and the median being more resistant to outliers.

--

--