Multiple Linear Regression from scratch using the Ordinary Least Squares method

Khush Joshi
5 min readMar 14, 2024

--

What is Multiple Regression ?

It is a regression algorithm in which we have multiple input columns(columns whose data we used for prediction) and single output column(column whose data we predict)

The dataframe that we are going to use in this blog is ‘Startups,’ and it has four columns. Among them, ‘R&D Spend,’ ‘Administration,’ and ‘Marketing Spend’ are input columns, while ‘Profit’ is our output column.

The mathematics behind Multiple Regression

Above is the equation for hyperplane.

Here, X represents the number of input columns, and B represents the rate of change of output for the respective value of X.

Readers can imagine B as the weight(importance) of a particular column. If B1 is greater than B2, we can say that output value depends more on X1 compared to X2.

For general purposes, let’s take a dataset with n rows and m input columns

So for each prediction our equation will look like

Here, ychar (will be denoted as yi further on) is the predicted output for the respective row

To achieve the optimum result, we aim to position our hyperplane close to all points in the coordinate system. To determine the best fit hyperplane, we need to find the values of B

We have equations, so we can do two things

1 Solve the values for the above equations with each single equation

2 we can take the help of matrices, which are used to solve systems of linear equations

We will take second point into account

So now we can represent our equations like

we can write above representation as

here

1 Yi is matrix of predicted points

2 X is the matrix for datapoints at a particular column for a specific row

3 B is matrix for respected B(beta) values

Finding Error

e = (Y-Yi)

Here, the error is calculated as the actual value of the output column (Y) subtracted by the predicted value (Yi).

The issue with ‘e’ is that it contains both positive and negative values. When we add them to find the total error, the positive and negative values neutralize each other. Therefore, we either need to square all the error values or take the absolute value of each ‘e’ value.

We take square of all the values because

  1. we want to penalize outliers
  2. |x| curve is not differentiable at x=0

E = error

e^T = e Transpose

(Y^T * Yi) = (Yi^T*Y)

proof

From the above, we can say that (Y^T * Yi) shape comes to be 1X1, hence it is a singular matrix (whose transpose is equal to the matrix itself)

and

(Y^T * Yi)^T = (Yi^T*Y)

hence we can say (Y^T * Yi) = (Yi^T*Y)

so now we got the value of E as

We need to minimize the value of E to obtain the best-fit hyperplane. Since E is dependent on B, we can use differentiation to find the value of B for which E is minimum

The differentiation of (XB)^T * (XB) is 2(B^T)(X^T)X according to the rule that the differentiation of (X^T)AX is 2(X^T)A, where A is a singular matrix.

We can prove that (X^T * X)^-1 is a singular matrix

proof

Hence proved that (X^T * X)^-1 is a singular matrix

So final value of B is

From the above equation, we can find the values of B and put them in the equation of the hyperplane to get the best-fit hyperplane.

Multi Linear Regression Implementation in Python

class myLr:
def __init__(self):
self.coeff = None
self.intercept = None
def fit(self,x_train,y_train):
x_train = np.insert(x_train,0,1,axis=1)
p1 = np.linalg.inv(np.dot(x_train.transpose(),x_train))
p2 = np.dot(x_train.transpose(),y_train)
B = np.dot(p1,p2)
self.intercept = B[0]
self.coeff =B[1:]
def predict(self,values):
output = np.dot(values,self.coeff) + self.intercept
return output

lr = myLr()
lr.fit(x_train,y_train)
print(lr.predict(x_test))

The fit function is the practical implementation of the values of B which we derived in the aforementioned part of the blog

Comparing Our Linear Regression and Scikit-learn Linear Regression

We will be using mean_absolute_error, mean_squared_error, and r2_score to check the difference between our linear regression and Scikit learn regression

from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

val1 = []
val2 = []
val3 = []

val1.append(mean_absolute_error(y_test,lr.predict(x_test)))
val1.append(mean_absolute_error(y_test,mylr.predict(x_test)))

val2.append(mean_squared_error(y_test,lr.predict(x_test)))
val2.append(mean_squared_error(y_test,mylr.predict(x_test)))

val3.append(r2_score(y_test,lr.predict(x_test)))
val3.append(r2_score(y_test,mylr.predict(x_test)))

values = {
"mean_absolute_error":val1,
"mean_squared_error":val2,
"r2_score":val3
}

pd.DataFrame(values,index=["LinearRegression","MyLinearRegression"])

From the above table, we can conclude that both models show the same result

Data Visualization of the dataset with the plane drawn using our Linear Regression

input = np.hstack((df["R&D Spend"].values.reshape(-1,1),df["Administration"].values.reshape(-1,1)))
mylr.fit(input,df["Profit"].values.reshape(-1,1))
xGrid, yGrid = np.meshgrid(df["R&D Spend"][:40].values, df["Administration"][:40].values)

final = np.vstack((xGrid.ravel().reshape(1,1600),yGrid.ravel().reshape(1,1600))).T
z_final = mylr.predict(final).reshape(40,40)
z = z_final

fig = px.scatter_3d(df.iloc[:40,:],x="R&D Spend",y="Administration",z="Profit")
fig.add_trace(go.Surface(y = df["Administration"][:40].values, x = df["R&D Spend"][:40].values, z=z ))
fig.show()

The above hyperplane is generated by the predictions of our model. It is the best-fit hyperplane passing through our datapoints

--

--