# Expectation & Variance of OLS Estimates

In one of my previous articles, I had derived the OLS estimates for simple linear regression. I’ll try to dig a little deeper and explain some more features of these estimates.

As shown earlier, a simple regression model is expressed as:

Here α and β are the regression coefficients i.e. the parameters that need to be calculated to understand the relation between Y and X. i has been subscripted along with X and Y to indicate that we are referring to a particular observation, a particular value associated with X and Y. εᵢ is the error term associated with each observation i.

Using some mathematical rigour, the OLS (Ordinary Least Squares) estimates for the regression coefficients α and β were derived. Under the OLS method, we tried to find a function that minimized the sum of the squares of the difference between the true value of Y and the predicted value of Y. The following estimates were obtained for α and β:

Here, α-hat is the estimate for α, and β-hat is the estimate for β. Before going further, it’s imperative to explore some basic concepts and properties of expectation and variance:

**Expectation of a random variable**

The expectation of a random variable X is much like its weighted average. If X has n possible outcomes X₁, X₂, X₃, …, Xₙ occurring with probabilities P₁, P₂, P₃, …, Pₙ, then the expectation of X (or its expected value) is defined as:

Properties of expectation of random variables:

- The expectation of a constant is the constant itself i.e.,

2. For two random variables- X & Y, the expectation of their sum is equal to the sum of their expectations. In other words,

3. If Y = aX + b, then the expectation of Y is defined as:

4. If Y = a1X1 + a2X2 + … + anXn + b, then the expectation of Y is defined as:

5. If X and Y are two independent variables, then the expectation of their product is equal to the product of their expectations:

**Variance of a random variable**

The variance of a random variable X is defined as the expected value of the square of the deviation of different values of X from the mean X̅. It shows how spread the distribution of a random variable is. It is expressed as follows:

Properties of variance of random variables:

- The variance of a constant is zero i.e.,

2. For two independent random variables- X & Y, the variance of their sum is equal to the sum of their variances. In other words,

Note: In expectation, the above expression was true even if the random variables were not independent, but the expression for variance requires the random variables to be independent. For non-independent variables, the variance of the sum is expressed as follows:

Where, Cov(X, Y) is called the covariance of X & Y. Covariance is used to describe the relationship between two variables. It is defined as follows:

3. If Y = aX + b, then the variance of Y is defined as:

4. If Y = a₁X₁ + a₂X₂ + … + aₙXₙ + b, then the variance of Y is defined as:

5. If Y=CᵢXᵢ, then

If Cov(Xᵢ, Xᵢ) = 0, we get

This property may not seem very intuitive. However, it will play a major role in deriving the variance of β-hat.

6. A very handy way to compute the variance of a random variable X:

Now, we’ll use some of the above properties to get the expressions for expected value and variance of α-hat and β-hat:

**Expectation of β-hat**

As shown earlier,

It is known that,

Taking mean on both sides,

Substituting the above equations in Equation 1,

Note: β-hat is the estimated value, while β is the true value of the regression coefficient. Now, we’ll calculate the expectation of β-hat:

As discussed above, β is the true value of the regression coefficient. This makes it a constant. The expectation if a constant is that constant itself (property 1A). We can now use property 3A to solve further:

The above equation is based on an assumption that we’ve made throughout simple linear regressions i.e., the expected value of the error term will be always zero. This gives us the following simple expression:

**Expectation of α-hat**

As shown earlier,

Also, while deriving the OLS estimate for α-hat, we used the expression:

Substituting the value of Y̅ from equation 3 in the above equation, we get:

Calculating expectation of α-hat,

Using property 2A, we obtain the following equation:

Here α, β, and X̅ are constants and can be separated using property 3A. As discussed earlier, based on our assumption, E(ε-bar)=0. Substituting E(β-hat) from equation 5,

So, the expected values of both α and β are equal to their true values. Such a result seems quite familiar. It is the property of unbiased estimators. An unbiased estimate θ-hat for θ will always show the property:

Hence, we have shown that OLS estimates are unbiased, which is one of the several reasons why they are used so much by statisticians. Moving on to variance:

**Variance of β-hat**

Using equation 1,

Now, a very important step:

For any variable,

Thus, the second term of equation 10 gets cancelled, giving us:

Substituting this in equation 9,

Note: The denominator of the expression is a constant, and therefore by property 3B, it will get squared when we take it out of variance expression.

Substituting the value of Yᵢ from equation 2,

Let,

The above term is a constant. So, using property 1B, Var(k) = 0. Thus, we arrive at the following equation:

We shall now use property 5B. In addition, we shall also use the assumption that Cov(εᵢ, εⱼ)=0 (For i not equal to j). So, we get,

Again, Σ(Xᵢ — X̅) is a constant, and by property 3B, it will get squared when we take it out of variance:

By constant variance assumption, Var(εᵢ)=σ² (a constant). So, we get:

We also define:

Finally, we arrive at:

Thus, after intensive mathematics, we got the variance of β-hat. Great feat! Now moving further:

**Variance of α-hat**

The variance of α-hat is defined as follows:

From equation 7,

Substituting this expression in equation 13,

Using property 2A, once again:

Now, we’ll calculate E[ε-bar(β-hat — β)]. From equation 2,

From equation 11,

Also, by definition,

On substitution, we get,

Since,

From equation 11,

Substituting this in equation 14,

Using equation 12 and equation 15,

Using property 3B,

The error terms ε₁, ε₂,…, εⱼ are independent. Thus, using property 2B,

Once again, by constant variance assumption, Var(εᵢ)=σ²(a constant),

And, that’s the expression we were trying to derive.

**Bonus Concept:** That was a very long derivation. Very tiring, I must say. Let’s conclude it by a small and fun derivation- the covariance between α-hat and β-hat:

From equation 7,

Using equation 16,

By definition,

Using equation 12,

Done!

**Conclusion**

The following table sums up all the main final equations we derived: