Covariance Vs Pearson Correlation Coefficient Vs Spearman’s Rank Correlation Coefficient

Sjatin
Analytics Vidhya
Published in
4 min readFeb 19, 2021

Covariance :

Covariance is one of the most used topic in data analysis or data pre processing. It is used to Quantify the relationship between features in a particular dataset. In simple words, it is used to understand the relationship between 2 or more different columns from a dataset.

Photo taken from Google Cloud

Math behind the covariance is similar to that of variance which is quite interesting. As we know that covariance is used for 2 variables and we can denote it as Covariance(x,y). Here Covariance(x,x) is equal to that of variance(x). This is how both variance and covariance are related to each other.

After all, this technique will only tell us whether a particular column is positively or negatively proportional to other, but it fails to tell us how much positively or negatively proportional.

Pearson Correlation Coefficient :

To understand the strength of the proportionality pearson correlation coefficient is introduced. This is a technique which will not only gives us whether the relationship between 2 variables is positive or neagative, but also gives us how much proportional it is.

If the data is linear then the results of pearson correlation coefficient will be accurate but if the data is non linear then the results won’t be perfect.

However the non linear data can be converted into linear by using several log normalization functions, but if the data is really huge and the outliers are more then it will be a real pain to convert them into linear data and there is a high chance of losing some data during this conversion.

Range of pearson correlation coefficient is -1 <= ρ <= 1

pic taken from Wikipedia

From the above picture it is evident that if the data is linear then the value of ρ is anything but 0. If the data is non linear like the last figure then the value of ρ is 0, Which means there is no relationship between x and y variables. But we can’t conclude that if the data is non linear then the relationship will be null or close to null without any other evidence.

Spearman’s Rank Correlation Coefficient :

To understand the relationship between non linear data perfectly, Spearman’s Rank Correlation Coefficient method is introduced. It is similar to that of pearson correlation coefficient with a small variation.

While calculating, instead of considering the exact values from the columns, Initially we have to sort all the values in an ascending order and give them the ranks. Then these ranks are considered for calculation. It looks like there won’t be any impact because of this change, but in reality this small change helps us to find the accurate relationship between variables even if the data is non- linear.

The below picture will give some clarity about spearman correlation.

Picture taken from Wikipedia

In the first picture though the data is non linear, by the looks of it, It is evident that the relationship is positive. So the spearman correlation is 1 and pearson correlation is close to 1 but not exactly equal to 1. Hence Pearson correlation is not accurate with non linear data.

In the second picture though the data is linear there are some outliers and these outliers may affect the result. But the effect of outliers is high in pearson correlation as its value is 0.67, whereas this effect is less in the case of spearman’s correlation as its value is 0.84.

This is how spearman’s correlation can be more helpful than pearson correlation.

Conclusion :

After seeing all 3 methods, it is clear that applying spearman’s correlation is more helpful. But doing both is interesting because if you have Spearman’s value > Pearson’s value, that means that you have a correlation that is monotonic but not linear. I hope this article helps you understand the difference between the correlations in an easier way.

--

--

Sjatin
Analytics Vidhya

Self motivated writer, Loves reading anything. Interested in exploring new technologies like DL, ML, AI. Writing is my passion and I would never leave it