Covariance and Correlation Math and Python Code

Santhosh J
Analytics Vidhya
Published in
3 min readAug 17, 2020

There is always a confusion when you are pursuing Statistical concepts. Let us see Covariance and Correlation in detail in this article.

First useful information you can gather from data is how two variables are associated to each other. In layman terms, how one variable is related to other variable?

Covariance and correlation are two significantly used terms in the field of statistics and probability theory. Both the terms measure the relationship and the dependency between two variables

Co-Variance

Covariance indicates the direction of the linear relationship between variables. Covariance values are not standard (…can be any number)

Covariance is given by formula -

Co-variance formula for Population and Sample

Correlation

Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized.

Correlation is given by formula -

Co-relation coefficient formula

We can quantify the strength of the relationship with correlation. Weak correlation have small correlation value, where as strong correlation have larger value. You can obtain the correlation coefficient of two variables by dividing the covariance of these variables by the product of the standard deviations of the same values. When you divide the covariance values by the standard deviation, it essentially scales the value down to a limited range of -1 to +1. Correlation describes relationships and is not sensitive to the scale of data.

Now let us see few differences between Covariance and Correlation -

Difference between Covariance and Correlation

Now let us calculate and understand with an example, so it stays in our mind.

Let us take a simple example : X and Y are two variables

Substituting in Co-Variance equation, we get,

Co-Variance of Sample is 8

Let us verify this in Python –

Our Calculation and Python Calculation are matching

Now let us calculate Co-relation,

Substituting in above equation we get Sx and Sy as 1.58 and 5.21

Substituting in Co-relation equation, we get -

Co-relation Co-eff (r) = 0.9701

Let us verify this in Python –

Manual Calculation and Python’s function results are matching

Conclusion –

We understood co-variance and correlation separately. We experimented using an example. We then compared with Python code.

We got co-variance value as 8, which is a positive number (can be any positive infinity). And we achieved correlation coefficient as 0.97 which is clearly a strong positive correlation between X and Y.

--

--