Understanding Ordinary Least Square in Matrix Form with R

5 min readMar 5, 2018

Linear regression is one of the most popular methods used in predictive analysis with continuous target variables, such as predicting household income or mortality rate. It attempts to model the relationship between variables by fitting a linear equation to observed data while trying to minimize the difference between predicted values by the model and observed values. So, we want a model that fits our data as much as possible. I am not going to dive into linear regression or model assumptions, rather with this article, i would like to talk about how population parameters are being estimate using matrix form. Now a days, we have tools and packages that does the job for us, which is great, but i believe it is also important to understand the math behind it, that is why I want to write about it.

Statistical model of linear regression can be expressed as matrices. Here target variable can be expressed with a matrix with nx1, where n is the number of observations in our data. Independent variables (X’s) can be expressed as nxk, where k is the number of independent variables including the intercept. Betas can be expressed as a matrix with kx1. Error terms can be expressed as nx1 matrix.

Statistical Model in Matrix Form (Rosenfeld, 2013)

Here, the first column of X consists of 1’s, because due to matrix multiplication rules, the first row will be multiplied with the entire Betas and will be added up, so the first element needs to be 1 because it will be multiplied with the intercept, in this case β₀ and β₀ is a constant number that doesn’t get affected by independent variables.

Now, as mentioned before, the goal of linear regression is to minimize the errors. What are errors? Errors are the differences between the predicted and actual values and can be expressed as ∑ e² where e is (y-Xβ). So here we know that errors are represented as a matrix with nx1. So how do we calculate e²? We can not simply multiplied e with e because you can not multiply a matrix with nx1 dimensions and another matrix with nx1 dimensions. But we can multiply transpose of e with e, since transpose of e will have 1xn dimensions, where e has nx1 dimensions. This exactly what we are going to do. Transpose of e can be represented as e'.

∑ e²=e'e

Now, we know that e is (y-Xβ), lets put this into equation.

∑ e²=e'e=(y-Xβ)'(y-Xβ)

Lets open the right side of the equation:

∑ e²=e'e=(y'-X'β')(y-Xβ)

∑ e²=e'e=y'y - β'X'y - y'Xβ + β'X'Xβ

Lets remember a matrix rule “the transpose of a scalar is the scalar”. Scalar matrix is a matrix where the transpose is itself. These matrices include 1x1 matrices and y'Xβ matrix is a 1x1 matrix. How? y' inverse is a matrix with 1xn dimensions, X is a matrix with nxk matrix and β is a matrix with kx1. Now if we do matrix multiplication with y' and X, we would end up with a matrix with 1xk. Then if we do matrix multiplication with y'X and β, i would have a 1x1 matrix. Since we know that 1x1 matrices transpose is itself, we can rewrite it as following: y'Xβ = (y'Xβ)'= β'X'y

∑ e²=e'e=y'y -β'X'y-β'X'y+ β'X'Xβ

∑ e²=e'e=y'y -2β'X'y+ β'X'Xβ

So far so good, now comes the derivation part. To find the β estimators that minimizes the sum of squared residuals, we need to take the derivative of the equation with respect to β. We can not directly take derivate since this is a matrix, we need to take matrix derivative. For simplicity, lets assume that we took the matrix derivative.

−2X' y + 2X'Xβ = 0

Here we would like to leave β at one side. I can re-write the formula as following, but the problem is in matrices i cannot do division. Therefore, in order to leave β alone, i need to take the inverse of (X'X) and multiply both sides, because when you multiply a matrix by its inverse you get identity matrix. In the end right side of the equation will left with only β.

X'Xβ = X' y

(X'X)ˉ¹X'X β = (X'X)ˉ¹X'y

β = (X'X)ˉ¹X'y

And this is the result! You just did it manually. So the least square coefficient estimators are found by; taking the inverse of transpose of X multiplied by X, and multiplying it with transpose of X and y. Not so hard is it? Lets do this with R.

First, lets create a random matrix in R, where we have 8 observations with 2 independent variables as X₁and X₂ respectively and 1 intercept. After creating our X₁and X₂variables we are binding into a matrix, since our formula that we have been analysing only takes matrices into account.

y=matrix(c(52,45,58,50,61,50,62,49))
X1=c(4,1,-1,0,5,5,6,4)
X2=c(1,1,0,0,1,1,0,0)
X=matrix(c(rep(1, 8),X1,X2),nrow=8)

We want to calculate (X'X)ˉ¹X'y and find β estimates. In R, you take transpose of a matrix with t(X), do matrix multiplication with %*% and take the inverse with solve function.

solve(t(X) %*% X) %*% t(X) %*% y

Lets check our answer using built in linear regression function called lm. For the lm function, we don’t use matrices as inputs, we need to specify the columns separately, as shown below.

lm(y~X1+X2)

As can be seen, our matrix function works and we were able to get the same coefficient estimates.

Lastly, we can use our (X’X)ˉ¹X’y formula to test whether it satisfies no perfect collinearity assumption of linear regression. Let’s examine an example related with perfect multicollinearity, by assuming that X₂ is twice of the X₁.

y=matrix(c(52,45,58,50,61,50,62,49))
X1=c(4,1,-1,0,5,5,6,4)
X2_2=X1*2
X=matrix(c(rep(1, 8),X1,X2_2),nrow=8)

Let’s apply our function and see the results.

model2=solve(t(X) %*% X) %*% t(X) %*% y

I got an error. The error message is telling us that our matrix is singular and cannot be inverted. If our independent variables are highly correlated, (X’X)ˉ¹ can not be calculated, therefore our formula satisfied this assumption.

I understand OLS estimation this way better, i hope it will help you as well.\

References

Rosenfeld, M. (2013, October 22). [Digital image]. Retrieved from https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf

Understanding Ordinary Least Square in Matrix Form with R

Written by Bengi Koseoglu