Covariance vs Correlation

Sarika Srivastava
Analytics Vidhya
4 min readApr 14, 2021

--

This article is about Covariance and Correlation : what are they and how are they related and how they differ from each other.

Covariance and Correlation are the two statistical measures which tells us about the relationship between two random variables.

Covariance

Covariance tells us how two random variables vary together or you can say how variables change in relation to each other. It tells us if there is a relationship between two variables and in which direction that relationship is in.

Covariance can classify 3 types of relationships between two variables:

  1. Positive trend: Indicates that two variables tend to move in the same direction. In this case, when value of one variable increases then the other will also increase. The value of covariance is positive in this case.
  2. Negative trend: Reveals that two variables tend to move in inverse directions. In this case, when value of one variable increases then the other will decrease. The value of covariance is negative in this case.
  3. No trend: This shows there is no relationship between two variables. The value of covariance is near to 0 in this case.
Three types of relationships

The formula to calculate Covariance is given as

Covariance formula

Where,

X and Y are two variables,

X bar and Y bar are means of both variables,

n is total no of observations/data points.

Problem with interpretation

We can interpret from the formula above, the covariance is measured in units. The unit/scale of the variables will be the unit of the covariance. And this is the reason, Covariance values are sensitive to the scale of the data and this makes them difficult to interpret. The value of the Covariance will differ if we change the scale/unit of the variable. Lets see an example :

house_data.head()

house_data

In the above data, we have size of the house in ft and in cm and the target variable is price. Lets see the covariance between the size and the price in both the scales — ft and cm :

house_data.cov()

covariance

We can interpret from the value that the relationship between the size and price is positive. As the scale if the feature size changes the value of covariance also changes and thus we cannot tell anything about the strength of the relationship.

The value of Covariance can vary from ∞ to -∞. This problem can be fixed if we standardize the covariance so that it remains same whatever be the unit/scale of the variable. So we divide the covariance with the standard deviations of the two variables. The result is called Correlation coefficient.

Correlation formula

Correlation

Correlation gives us the direction as well as the strength of the relationship between the two random variables. It measures the extent to which the two variables are related to each other. Correlation is dimensionless, i.e. they are not measured in units but they are pure value. The value of correlation coefficient varies from -1 to 1.

High absolute value indicates strong relationship and the sign of the value tells the direction of the relationship.

In the above figure we can see:

-1: this means there is perfect negative relationship between two variables I.e. variables move in opposite directions.

0: no relationship between two variables.

+1: this means there is perfect positive relationship between two variables i.e. variables move in the same direction.

Now, lets check the correlation between size and price.

house_data.corr()

Correlation coefficient

The value of correlation is 0.87 which shows:

1. The positive and strong relationship between size and price

2. The change in scale of the variable size doesn’t change the value of the correlation coefficient which shows it is dimensionless.

Conclusion

Covariance and correlation are very closely related to each other. Covariance is just a measure of correlation and correlation is a scaled form of covariance. When it comes to chose between them, correlation stands first choice as it is not affected by change in scale or dimension. The value of correlation ranges between -1 to 1 , so its easier to interpret. The only limitation of covariance and correlation is that they measure only linear relationship.

Thank You

--

--