Correlation and Covariance

Chandler: Hi Joey!

Joey: Hi man…. (In a very feeble voice)

Chandler: You look so tensed, is everything ok?

Joey: I told you right? Last week my kid had an audition.

Chandler: Yeah, I kind of remember it, you were also rushing home early that evening.

Joey: Yes man….it seems like he got through the audition and now he will be performing a play his school annual day.

Chandler: That’s a good news right?

Joey: Yeah, but now his teacher is asking him to come up with his own script by tomorrow and he is bugging since this morning. And I also need to submit our monthly report to the VP by tonight, so I am kind of tired. Can you help me with the report?

Chandler: Sure! That’s why I am here for.

Joey: Thanks man…First I need to understand few terms in order to complete this report. What is the difference between variance and co-variance, I usually get confused between them.

Chandler: Variance of the population (σ2 ) is given by


μ=Mean of the sample

xi=ith datapoint

n=Number of datapoints in our population

Now let’s dig a bit deep into this formula, (xi-μ ) represents the distance of the ith data point from the sample mean, it gives us the average of the square distances of the data points from the mean. It is used to measure how far our data is spread out. If the variance is zero, then it implies that all our data points are same. We usually take the square root of the variance to ensure that the unit of the mean is retained and it is called the standard deviation, which is given by

One important caveat here is that, we need to know the population mean μ (which is not known generally) for finding out these numbers

Joey: Is there any other formula for finding out these numbers? Because I have seen people using a different formula than the one which you have mentioned.

Chandler: Yes, indeed!

As I mentioned, the above formula require population mean (μ) to use them. But iIf we don’t know the population mean we can replace it by its estimate (i.e. sample mean) which results in a modified formula.

We replaced n by n-1 in the below formula to make the estimates unbiased.

Joey: So essentially variance captures how much of dispersion a single random variable possess, am I correct?

Chandler: Absolutely!

Joey: So what is covariance then?

Chandler: Variance is a measure of how much of dispersion a single random variable possess, whereas Covariance is a measure of how much two random variables vary together. It is given by the formula below:

From the above equation, we can clearly see that each observation (xi, yi) contributes positively or negatively to the covariance. Let me explain the same using the below graph.

Let us assume that I have a two dimensional data set (Variables: X and Y) and also assume that both X and Y variables take only positive value and the mean of X and Y variables are 5 and 5. I have bifurcated the region in the graph into four quadrants based on the means of X and Y. From the equation, we can see that if the data points falls in the 1st and the 3rd quadrants, the data points contributes positively to the covariance. Otherwise, it contributes negatively to the covariance. The magnitude of the contribution is also depends on how far away the data points are from x and y (Sample mean of X and Y).

Another major difference between the variance and co-variance is that the variance plays a major role to visualize the shape of the one dimensional dataset. The mean and variance are enough to draw the shape of one dimensional data.

If the dimension of dataset is more than one, mean and variance alone are not enough to visualize the geometry of data. In this case we require to know the orientation of the data as well and covariance becomes handy in these cases.

The variance of the X variable is given by the spread of data along the x-axis (σx). Similarly, the variance of the Y variable is given by the spread along the y axis (σy). However, both σx and σy don’t explain the diagonal orientation of data. The diagonal orientation in the above figure is due to positive co-variance $$ (σ_{xy}) $$ between x and y value. We knew that the variables X and Y in the figure are positively related since a higher value of Y results in a higher value of X and vice-versa. We can compactly represent both variance and co-variance in a nicer format which is called co-variance matrix and it is given by

If X is positively related to Y, then Y is also positively related to X (verify it through the covariance formula and you will get the same answers for both). In other words, we can state that σxy= σyx. Therefore, the covariance matrix is always symmetric in nature. And the diagonal elements represents the variances and the off-diagonal elements represents the covariance. As we have already mentioned before, two-dimensional data is explained completely by its mean and its 2×2 covariance matrix. Similarly, a N×N covariance matrix captures the spread of N-dimensional data.

The above figure explains the shape of data for different covariance matrices.

Joey: That’s great, but then what is the difference between the covariance and correlation?

Chandler: Correlation is a function of covariance and it is given by

The below table give the differences between the correlation and the co-variance.

Joey: Can we use correlation in our data and make some interpretation?

Chandler: Yes Joey, we can very much use it to draw meaning full business insights.

In our case suppose if we want to check if more number of clicks per session results in more number of products purchased, then we simply find out the correlation between these two variables,

Joey: That’s amazing, in fact our VP suspects that there should be some relation between these two variables.

Chandler: Okay, then let’s find out what statistic tells us

Correlation (Clicks per session, number of products purchased) = 0.001549

Aaahhhh…this is not very encouraging,

Joey: Is anything wrong?

Chandler: Since the correlation is close to zero there seems no relation between the number of products purchased and the clicks per session.

And the variance, covariance matrix for our case will be


X1= Number of products purchased, and

X2= Clicks per session

Joey: Cool then, I will inform our VP that the numbers doesn’t support his suspicion, it would help him take better decisions.

Chandler: Yeah you should do it as soon as possible.

Joey: Thanks Chandler, now can you help me in finalizing a script for my son’s play :P ?

Chandler: Hahahaaa…You are kidding right?

Joey: No actually I am clueless, at least do you know who can help me?

Chandler: Let me check.

(Chandler takes out a visiting card from his purse and hands it over to Joey who grabs it immediately and goes through it)

Joey: Estelle… who is she?

Chandler: She is an agent to one my roommate who is aspiring to be an actor. May be she can be of some help to you.

Joey: Thanks again Chandler. You have done more than what you are supposed to do, take the rest of the day off and enjoy with your girlfriend Monica.

Chandler: How do you know about Monica??

To be continued…..

The author of this blog is Balaji P who is pursuing PhD in reinforcement learning at IIT Madras



GreyAtom is committed to building an educational ecosystem for learners to upskill & help them make a career in data science.

Shweta Doshi

Written by

I am an unapologetic idealist who believes that to gain quality education,we need to transform the way we teach & learn.I am the Co-Founder at



GreyAtom is committed to building an educational ecosystem for learners to upskill & help them make a career in data science.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade