Skewness — How to measure it?

Jaya ndran
3 min readJul 29, 2021

--

What is it? How to measure? and What are its significance?

In order to understand skewness let's have a look at what is Normal Distribution is.

Consider an example for the below data which is Normally Distributed.

An Example of Normal Distribution curve
An example of a normal distribution (Bell Curve)

From the above bell curve which is also known as a normal distribution curve, Let us crack the symbols which post as rocket science threats.
μ- Mean of the data, (1σ,-1σ)- Mean±Standard Deviation, (2σ,-2σ)- Mean±2(Standard Deviation) and (3σ,-3σ)- Mean±3(Standard Deviation)

I hope it is still unclear, let us drive the basic insights from the curve.

· Area range covering (1σ,-1σ) contains 70% of the data.

· Area range covering (2σ,-2σ) contains 95% of the data.

· Area range covering (3σ,-3σ) contains 99% of the data.

The above insights which state the distribution of data are what the statistician refers to as Chebyshev’s Theorem. Also, the mean median and mode all lie on the center of the bell curve.

This type of distribution is normally difficult to find in real-world scenarios. The real-time data set always has some kind of asymmetry in distribution. This asymmetry is what we refer to as Skewness in the data.

Graph of Left-skewed and Right skewed data

When the frequency on the left-hand side of the graph is low, it is referred to as negatively skewed and when the frequency on the right-hand side of the graph is low, it is referred to as positively skewed.

Building up a statistical or predictive model over the skewed data may result in a bias towards a particular area. This adversely affects the model's output. On few occasions, skewness values are used to obtain approximate probabilities of the distribution.

Measuring the skewness

There are a number of ways to measure the skewness of the data. Common methods include

1. Karl Pearson Formula 1: -

Skewness = 3 (Mean-Median)/Standard Deviation

The above formula is the one that is used in Python’s scipy library.

2. Karl Pearson Formula 2: -

Skewness = 3 (Mean-Mode)/Standard Deviation

3. Bowley’s Co-efficient: -

Skewness = [(Q3+Q1)-2(Median)]/(Q3-Q1)
where Q3 (3rd quartile) and Q4 (4th quartile) refers to 0–25 % of data and 50–75% of data.

4. Based On Moment:-
The below formula is used in the MS-Excel skewness function to calculate skewness.

where n= number of observations, s= Standard Deviation

As there are different methods to calculate skewness, the question of which one to use arises.

Photo by Chris Ainsworth on Unsplash

To simply answer the above question, it depends. It depends upon the data set which we use to build our models.

For ex:- Some might have 2–3 variables where the Pearson method fails, Some times the Q3 and Q4 data could be biased, this fails down the Bowley's coefficient formula. Or in some cases, the skewness is near to 0 that we never try really hard to correct it.

Thus the art of analysis lies with selecting the appropriate one and verifying it with further analysis of the data to get better insight into whether the chosen method is really appropriate.

No matter what method we use skewness value always ranges from -3 to 3 and skewness can be classified as

skewness = 0 : normally distributed.
skewness > 0 : more weight in the left tail of the distribution.
skewness < 0 : more weight in the right tail of the distribution.

--

--