ML From Scratch —Linear Regression

Vivian Ouyang
4 min readAug 12, 2021

--

I will write a series of short articles to illustrate how to implement ML codes from scratch and also introduce the algorithm’s pros and cons in short.

Pros:

  • Runs very fast
  • Have solid statistical foundation
  • Easy to explain to other audience without tech background since they can see the coefficient for each feature.

Cons:

  • Need to spend lots of time on feature engineering
  • Have statistical assumptions: linear relationship, residual independence, residual homoscedasticity and residual normality.
  • To speed up the computation, you need to scale the features.

Introduction of Theory

Let’s assume we have m samples and n features.

  • Expression

We can start with the following formula:

The w* is the coefficient for each feature. x1, x2,…, xn are the features values. b is the bias. We can rewrite b as w0*x0 in which w0=b and x0 are all 1. Thus the formula can be transformed to:

For each sample i, we can write the vector formula

The scalar expansion is written as:

The matrix multiplication is written as:

Then we extend to all m samples.

  • Loss and cost function

For regression, we use square loss function

The cost function is

We can further write the matrix formula as

This is a convex function.

  • Gradient Decent

We can write the matrix formula

Implementation:

We use the famous Boston price dataset in sklearn.

  • Step1: Get the train and test dataset

In this implementation, we need to import MinMaxScalar for scaling the features, train_test_split to get train/test datasets.

Let’s look at the dimension of train dataset

You can consider X’s shape as 404*13 and y_train is column vector and the shape is 404*1

  • Step2: Define LinearRegression class and set init values

To simplify the implementation, we only set 3 main parameters. alpha is learning rate, epoch is the loop numbers, fit_bias is the bias b in the formula above. cost_record is a list to save all cost function results. Our goal is the find the parameters with the smallest cost.

  • Step3: Define the predict() function

We write the predict function as the dot product of matrix X and coefficient w

  • Step4: Define the fit() function

This is the most difficult part. We need to use gradient decent to update the w. One trick part is the 1-D array. When we use transpose function .T, it will not influence its shape. One example is

Below is the fit function

The trick part happens for self.w. I add .T in

self.w -= (self.alpha/m * np.dot((y_pred-y_train).T,X_train)).T

But you can also write it as

self.w -= self.alpha/m * np.dot((y_pred-y_train).T,X_train)

Both are the same

  • Step5: Define save and load model functions
  • Step6: Show the final output

We can also plot the cost function

Github Link:

https://github.com/oyww710/ML_Scratch_Practice/blob/main/linear_regression.ipynb

--

--

Vivian Ouyang

Principal Data Scientist at UnitedHealth Group. Great passion for solving data-driven questions in healthcare.