Covariance and Correlation Math and Python Code
There is always a confusion when you are pursuing Statistical concepts. Let us see Covariance and Correlation in detail in this article.
First useful information you can gather from data is how two variables are associated to each other. In layman terms, how one variable is related to other variable?
Covariance and correlation are two significantly used terms in the field of statistics and probability theory. Both the terms measure the relationship and the dependency between two variables
Co-Variance
Covariance indicates the direction of the linear relationship between variables. Covariance values are not standard (…can be any number)
Covariance is given by formula -
Correlation
Correlation measures both the strength and direction of the linear relationship between two variables. Correlation values are standardized.
Correlation is given by formula -
We can quantify the strength of the relationship with correlation. Weak correlation have small correlation value, where as strong correlation have larger value. You can obtain the correlation coefficient of two variables by dividing the covariance of these variables by the product of the standard deviations of the same values. When you divide the covariance values by the standard deviation, it essentially scales the value down to a limited range of -1 to +1. Correlation describes relationships and is not sensitive to the scale of data.
Now let us see few differences between Covariance and Correlation -
Now let us calculate and understand with an example, so it stays in our mind.
Let us take a simple example : X and Y are two variables
Substituting in Co-Variance equation, we get,
Let us verify this in Python –
Now let us calculate Co-relation,
Substituting in Co-relation equation, we get -
Let us verify this in Python –
Conclusion –
We understood co-variance and correlation separately. We experimented using an example. We then compared with Python code.
We got co-variance value as 8, which is a positive number (can be any positive infinity). And we achieved correlation coefficient as 0.97 which is clearly a strong positive correlation between X and Y.