Linear Regression from Scratch

Alekhyo Banerjee
Analytics Vidhya
Published in
4 min readJul 27, 2020

Linear Regression needs no introduction. It’s the ‘Hello World’ of Machine Learning World. Let’s deep dive into the use cases of Linear Regression and mathematical intuition behind it.

We are in a world driven by data. As commonly said, ‘Data is the new oil’. One of the most important types of data analysis is regression. Regression aims to build a mathematical model that can be used to predict the values of a dependent variable based upon the values of an independent variable. If you have ever taken an introductory statistics course in college, likely the final topic you covered was linear regression.

Regression is a fundamental problem in machine learning and regression problems appear in a diverse range of research areas and application including time-series analysis, control and robotics, optimization, and deep learning applications.

Linear regression is one of the simplest supervised learning algorithms in our toolkit. Linear models make a prediction using a linear function of the input features. It is a common and useful method of making predictions when the target vector is a quantitative value (e.g., home price, Salary, etc).

There are two types of linear regression, simple linear regression and multiple linear regression:

  1. Univariate linear regression: A single independent variable is used to predict the value of a dependent variable.
  2. Multivariate linear regression: Two or more independent variables are used to predict the value of a dependent variable.

The only difference lies in the number of independent variables. In both cases, there is only a single dependent variable. We will focus on univariate linear regression for the sake of understanding and simplicity.

Which columns to identify as Dependent or Independent?

The independent variable of a dataset cannot be predicted based on a mathematical model or so. The dependent variable, also known as ‘Target Values’ are the observations we get. It is dependent on the independent variables. Let’s understand using an example

Here, Experience column is Independent variable and Salary column is Dependent variable

Let’s visualise the data by plotting them on a graph

If we can find a best-fitted line y=f(x) we can predict a value of y corresponding to a value of x.

Equation of a straight line is given by;

y= m*x + b,

where m is the slope

b is the intercept,

You might be wondering various lines can be fitted for various values of slope and intercept. So how to choose the ‘One’ curve among all?

To tackle this uncertainty, we introduce the concept of Sum of Squared errors or sum of squared residuals which is given by the formula

Residual of a point is the observed value of the dependent variable (y) and the predicted value (ŷ). An observed value above the predicted value is positive, below the predicted is negative so we take the sum of the square of residual of each point such that large residuals are penalised.

The line with a minimum sum of the squared residual is the BEST FIT LINE.

Is it really necessary to fit a straight line? Why can’t we plot a polynomial curve with various degrees which fits all the data points? Isn’t it preferable?

No, our model suffers from overfitting. The term overfitting refers to a model that fits very well to the data with which it is trained, but it poorly generalizes them, meaning that when faced with new values, the model yields poor results.

Our objective is to find a value for slope and intercept such that the sum of the squared residual is minimum. The simplest and most common approach is, using Ordinary Least Square method

The formula for slope and intercept using Ordinary Least Square;

Let’s get into the code section:

Now, our model is trained. Let’s predict

Tada !! We are through the basics of linear regression.

CONCLUSION:

Ordinary Least Square method is easy to understand and implement but it takes its own sweet time to train the model. It is computationally extensive. It is sensitive to unusual data points. Outliers can sometimes skew the results. Gradient descent is another algorithm for linear regression which is computationally faster and saves a lot of time on calculations which we shall cover in an upcoming article.

--

--

Alekhyo Banerjee
Analytics Vidhya

Data Science| Data Analysis| Data Visualisation| OOP|Python|C Second-Year Undergraduate in Computer Science and Engineering at RCCIIT,Kolkata