Covariance Vs. Correlation

Nidhifab
3 min readAug 8, 2020

--

This is a quick and easy comparison between covariance and correlation. We encounter both of these terms which are very important related to data preprocessing. Both covariance and correlation measure linear relationships between variables.

Let us consider two random variables size of house and price of house.

x= size of house

y=price of house

Taking different values of size and price, we have dataset as below

Size 1200 1500 1800

Price 1000$ 2000$ 3000$

Here, we can clearly see that, with increase in size of house there is increase in price also. Thus we quantify a relation between size and price. In other words we can say that we are trying to find relationship between two variables x and y. This is done with concept of covariance and correlation.

Covariance can be positive and negative.

If we plot graph between size and price of house, and

1) If value of price increases with increase in size, then difference between x and mean value of x (mu) will be positive. Similarly difference between y and mean value of y will be positive. So from above formula we can clearly see that covariance will be positive term.

2) If value of price decreases with increase in size, then difference between x and mean value of x (mu) will be positive. But,difference between y and mean value of y will be negative. So from above formula we can clearly see that covariance will be negative term.

We can see the direction with help of above graph. But, Covariance doesn’t tell anything about the strength of relationship between two variables i.e., how much positive or how much negative. For this purpose we have another term correlation.

Correlation tells us about relationship between two random variables in terms of direction along with strength of relationship between them.

In nutshell difference between covariance and correlation:-

Covariance:

1) Covariance is a measure of how much two random variables vary together.

2) Lie between -infinity and +infinity

3) provide direction of relationship

4) dependent on scale of variable

Correlation:

1) Correlation is a statistical measure that indicates how strongly two variables are related.

2) Lie between -1 and +1

3) provide direction and strength of relationship

4) independent on scale of variable

This post will be helpful to beginners and those who have started learning statistics. I will update this post with some more information.

Do share it if you find it helpful.

--

--