Writing Multivariate Linear Regression from Scratch

Anchit Jain
Data Science 101
Published in
6 min readMay 7, 2018

One of the toughest part of every data scientist’s journey is to really understand what happens behind the hood of popular libraries like scikit for implementing various machine learning algorithms.

This is a part my multi-post series on implementing various machine learning algorithms from scratch without using any machine learning libraries like scikit, pyspark etc.

The first algorithm that I am going to discuss is — the most basic — Multivariate Linear Regression.

What is Linear Regression ? Linear regression is a simple data prediction technique to predict dependant variable (Y) using its linear relationship to the independent variable (Y).

For example, I have data say number of rooms. Using the given input(number of rooms) I want to predict the price of a house. Well that’s not easy but at the end of this article you will surely get something. When I have many more features like (size of bedrooms, number of rooms, distance from city centre etc.) to predict the price of price of house then I’ll say its a multivariate Linear regression.

Now let me show you a glimpse of data using scatter plot. This graph will help you in basic understanding of how data looks like when plotted on graph.

From the above graph you may see some blue dots where each dot represent some value with respect to X and Y axis. Now I want to draw a line on the graph in such a way that covers maximum number of blue dots. Speaking technically, I want to find a slope of line in such a way where distance between each dot to line is minimum. This is called minimising Root Mean Square error

Hopefully I am clear so far. Moving ahead with my current data set. I have few columns by the name “size of room”,”number of bedrooms” and “price”.Based on size and number of bedrooms I want to predict the price of room.For your ease I have broken the entire process in step for easy learning.

STEP 1. Reading and Normalising the data.

Before we start with any problem we need to read and analyse the data.Well at first glance this step seems likely to be very easy, but it can be very painful if not taken care.

Why do we need to normalise the data? Well, that’s because some of our features might be in a range of 0–1 and other in 0–1000. Feeding the data as it is can lead to wrong fit.

Reading and Normalising the data

Normalize : To make data scalable to each other.

Before Normalising
After Normalising

If you can notice here, now the data here is scaled to some limited range.Which makes an easy visualisation of data on scatter plot.

Before we jump on this, we need to understand the use of hyperparameter in our model. These makes sense when we need to tune our model in order to minimise the cost function.Here our model is nothing but a mathematical equation of a straight line that is y = mx + c, where x is the given sets of input,m is the slope of line , c is the constant and y is the output(which is predicted)

Learning rate and iterations these are the hyper-parameter that plays a vital role in tuning our model.Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient and how many times we need to tune our model is our iteration.

Analysis : In linear regression, we have the training set and the hypothesis. We already have the training set as above and our hypothesis will be:

Equivalent to y = mx + c

Where θ’s are the parameters and h(x) is y-predicted.

STEP 4. Calculate the cost function

The objective of the machine learning exercise is to find the values of these θ’s so that the function h shown above is close to actual values for the training examples. Speaking in mathematical terms, we want to minimize the difference between h(x) and the corresponding value of y squared. We will call this our cost functions.

In simple words it is a function that assigns a cost to instances where the model deviates from the observed data. In this case, our cost is the sum of squared errors. The goal of any supervised learning exercise is to minimize whatever cost we chose. Our cost function can also be shown using the below equation and this is how we can calculate our cost function.

Cost function equation

This equation is nothing but the summation of square of difference between the y-predicted and y actual divided by twice of length of data set. And from above equation our goal is to minimize the function of J.

Ploting J on graph will give you more clear understanding of this function.

Cost(Squared Error) Vs. No of Iterations

It can be easily seen from the above graph that at iteration near to 2 the value of cost is minimum and we can say that iterating our training data twice we will get the minimum cost function.

Let me quickly summarise what we have learnt so far.

Now let us see how this cost function looks like.

Cost function

Step 5. Gradient Descent.

What does it mean and where does it comes from ? If you can see from the cost function graph our ultimate goal is to move towards the bottom most point of the graph. A way to achieve this is using this Gradient Descent algorithm. As its name suggests we need to iterate the below procedure till convergence.

Gradient Descent Algorithm

Here α is the learning rate and we multiply it with the derivative or the gradient of J. Well gradient descent method is not only confined upto linear regression model but can be used in other model as well when iterative optimization of algorithm that finds the minimum value of a function comes into play.

Gradient Descent funtion

Lets check our obligatory analysis for the above code.

Gradient Descent

From the above graph, Our aim is to iterate from starting point and working with iterations in such a way that we finally land up on the minimum point of graph.This is achieved by tuning our model with learning rate and number of iterations.

So before I wind up let us summarise our learnings so far.

  1. We have learnt our data set.
  2. Normalizing the data set
  3. Training our model with normalized data
  4. Calculating the cost function
  5. Minimizing the cost function.

You can access the complete code and the data set here

Thank you for your patience …..Claps (Echoing)

--

--

Anchit Jain
Data Science 101

Machine learning engineer. Loves to work on Deep learning based Image Recognition and NLP. Writing to share because I was inspired when others did.