Intuitive Explanation of Sum/Difference Of Random Uncorrelated/Correlated Variables

Published in

Analytics Vidhya

5 min readSep 23, 2019

Wow that title is a mess ! That is because we are going to look at 4 concepts and writing each of them separately could cause more mess. Whatever, hope you are here after that title and ready to do some math 😎

The mean of the sum of the random uncorrelated/correlated variables:

Before dealing with the variance of the random variables, let’s talk about their means.

Let’s say, on average it is expected to see 4 people with a green t-shirt in a day (let’s call it X). So the average number of people with green t-shirt one can expect to see in a day is 4 . We also know that on average it is expected to see 3 dogs in a day(let’s call it Y).

So what is the expected value of seeing people with green t-shirt and dog in a day? One can expect to see 4 green t-shirt and 3 dogs, so expected value of their sum is just 4 +3 = 7.

It doesn’t matter if the variables are uncorrelated or correlated, mean of their sum is just to sum of their individual means.

The mean of the difference of the random/correlated variables:

It is just like previous part, but we will subtract values instead of adding because it is like to ask how many more instances from X I am expecting to see than Y.

In our example it is 4–3 = 1. One can expect to see 1 more instances of people with green t-shirt than dog.

The variance of the sum of the random uncorrelated variables:

In another story that I published , I mentioned the subtracting their means from parameters to make them share the same space. It is crucial to understand that point for understanding following concepts so it might be helpful to read that story.

We will do the same thing for X and Y. So they are in the same space now, we can add them to investigate the properties of their sum.

This also satisfies the sharing the same space concepts. vector s is the sum of random variables X and Y(and its mean subtracted from it to share the same space with other parameters).

Before moving on, I want you to show you the variance is the magnitude of the parameter vector divided by the number of measurements in that parameter.

The formal definition of variance 😒 :

The linear algebra variance 👍

So if we want to find variance of X+Y, all we need to do is just calculating magnitude of the vector X+Y-(mean(x)+mean(y)) in this space.

They are perpendicular to each other so we can use Pythagorean theorem to calculate the magnitude of s.

If we divide both sides with N we will get:

That is the variance of the sum of the random variables!

The variance of the difference of the random uncorrelated variables:

All sharing the same space stuff can be applied to this case too.

Divide each sides by N:

The variance of the sum of the correlated variables:

If the variables are correlated, angle between them is not 90⁰.

But our goal is the same. We want to calculate the magnitude of the sum of these two vectors and divide by N to get variance. To do that we will use the law of cosines.

We will use covariance in the final formula. Link to the explanation

If we divide both sides with N

That is the variance of the sum of the correlated variables!

The variance of the difference of the correlated variables:

That is the variance of the difference of the correlated variables!

(The sign of the covariance in the formula can differ across other formulas because of the choice of the theta)

Our theta:

Different choice of theta:

Wow that was an intense but intuitive derivation. We have derived the variance of the sum/difference of the random and correlated variables just by using linear algebra.

Don’t memorize, stay intuitive 👍😎