Intuitive Explanation of Correlation and Covariance Using Linear Algebra

Engin Deniz Alpman
Analytics Vidhya
Published in
5 min readSep 23, 2019

Mathematics is so much fun, it would be rude to just memorize it without intuitively understanding it.

In the internet, one can find dozens of explanations of the concepts that s/he is looking for. But most of them are just mathematical formulas, and so-called explanations are just like following every step and justifying that operations are right. But that is not understanding. That is not learning. Learning should be intuitive, understanding comes naturally afterward.

Today we are going to look at two statistical concepts which are (1) covariance and (2) correlation respectively.

We are investigating the relation between two different parameters which we will call X and Y. Relation in this context is not cause-and-effect type of relation , it is like how change in one parameter accompanied by other one (both increased/decreased, one increased/other decreased etc…). (Also it is important to note that the correlation that we will be investigating here is Pearson’s Correlation and it only captures linear relationship between the parameters so it is possible to have 0 for the Pearson correlation between two non-linearly correlated parameters.)

You can think X as the number of apples and Y as the number of lemons sold in the supermarket. X and Y will consist of several measurements corresponding different sales day. So if the data is coming from 10 days, X and Y will each consist of 10 values each corresponding to a sales of different day. We want to know if there is a relation between apple and lemon sales.

We will build intuitive understanding using linear algebra so let’s think X and Y as a vector. The first value of X and Y will correspond to the axis-1 in our space , second value will correspond axis-2 and so on…

This part is crucial. What does “X is increased/decreased” mean? Is all values of X constant ? How can we say that X is increased? X is just the name of the measurements. There is no constant value for X. At any specific day the number of apples sold can be bigger/smaller or same size relative to other ones, but nothing is increased, we are just looking at different measurement inside the X. So when we say one parameter is increased or decreased it is relative to the other measurements inside that parameters.

We can suspect from this fact that every parameter has its own space where “increasing”,“decreasing” and “non-changing” are meaningful. In order to make X and Y share the same space, we have to subtract their mean from them (element-wise) so their increase/decrease/non-changing behavior will have general meaning for X and Y at the same time. (When X and Y are bigger than their means, they are behaving similarly, when X is bigger than its mean but Y is smaller than its mean their behavior is inversely proportional).

Hmm we are investigating relation between two vectors… Isn’t dot product boost the similar components of the vectors ? It can be helpful in our case.

But before moving on, let’s think about metric we are going to use to define relation between X and Y. What are the extreme cases ?

  1. There is no correlation between two variables, change in values of X is not accompanied by meaningful change in values of Y (correlation =0)
  2. There is a strong positive correlation between X and Y, every time the value of X changes, it is always accompanied by the same corresponding change in Y in the same direction (Correlation = +1)
  3. There is a strong negative correlation between X and Y, every time the value of X changes, it is always accompanied by the same corresponding change in Y in the opposite direction (Correlation = -1)

So we are looking for a metric whose values are in the range [-1,1]. This values and how it relates to the relationship between parameters reminds me a cosine function. If two vectors are in same direction, angle between them would be 0⁰ and it will give us cos(0) = + 1 , if they point opposite directions angle between them would be 180⁰ and cos(180⁰) = -1, and if they have nothing in common and they are perpendicular to each other, angle between them would be 90⁰ and cos(90⁰) = 0.

So it seems reasonable to use cosine of the angle between X and Y as a metric for correlation 😎

From dot product we know that:

Dot product of X and Y.

As I mentioned earlier, we have to subtract their means from parameters so that they will be in the same space. We can treat mean of the parameters as a vector whose values are the same across their axis and equals to the mean of the corresponding parameter. So the final dot product will be:

We decided to use cosine as the metric so:

That’s it! Correlation is just the cosine of the angle between the parameters that we are investigating. But it looks different from what is commonly used for Correlation. So before ending this article I want to show you what we have found is the same as the commonly used formula for Correlation. Also I will show how covariance can be calculated using linear algebra as well.

--

--

Engin Deniz Alpman
Analytics Vidhya

I am teaching AI. But I guess I just love teaching, so the topic may change in the future too.