Introduction To Linear Regression in Machine Learning

Rishabh Jain
6 min readJun 1, 2020

--

What is a linear regression?

Before knowing linear regression let us discuss what is regression. Regression is used to predict the outcome of an event based on the relationship between variables obtained from the data-set. There are different types of regression techniques like logistic regression, linear regression.

Linear Regression

Linear regression is a supervised technique of machine learning in which output is continuous and has a constant slope. It is used to predict the values within a continuous range. They are used to predict the sales of the product. The red line in the above graph is referred to as the best fit straight line. There are two types of linear regression, single variables (simple regression) and multivariable( multivariable regression). The difference between both of them is the number of features.

In single variant regression there has only one feature(column) and in multi-variable has more than one feature but the mathematical solving behind both them is the same.

Let’s discuss the simple linear regression. Simple regression uses the traditional slop-intercept form, where m and b are the variables. Our algorithm will try to “learn” to produce the most accurate predictions. x represents our input data and y represents our prediction.

y = mx + b

m is the slope of the line.

b is the called bias.

As you can see this is the formula of a line, the motive of our algorithm is to find the best values of m and b.

Let’s go through two important concepts to understand linear regression better.

Cost Function

The cost function helps to find the best possible value of m and b. We convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value. Our goal is to minimize mean square error(MSE)to improve the accuracy of our model. It is called mean square error because we take mean of all the rows for the feature(attribute), we can also do half of the total iteration and it is called half-square error.

Mathematical Formula —

Cost Function(MSE)

At first, we take random values of m and b. The difference between the predicted values and the actual value measures the error difference. We square the error difference and sum over all data points( rows) and then divide that value by the total number of data points. This provides the average squared error over all the data points. Therefore, this cost function is also known as the Mean Squared Error(MSE) function. We can also divide by half of the total data points. It is called half-square error function.

Gradient Descent

We can say the gradient descent is a method to update m and b to reduce cost function. At first, we take some values of m and b and we change the values iteratively to reduce cost function. What gradient descent do is reduces the values in a way to reduce the cost function.

Gradient descent working

Suppose there is a U shaped curve you start from a topmost point from either side. You want to go the bottom but you can only take discrete steps.

3D visualizations of Gradient Descent

Here you have only two ways either take small steps which will eventually lead down but take a long time or take bigger steps but it can lead you to the other side hence you overshoot the goal. In gradient descent, there is a parameter called learning rate which will decide at what rate it will go down ( descend).

Sometimes the cost function can be a non-convex function where you could settle at a local minima but for linear regression, it is always a convex function.

You must be wondering how are we updating our values or reducing cost function. To find these gradients, we take partial derivatives with respect to m and b. The calculus behind this is —

Fig(a) Partial Derivatives
Fig (b) Updating the values

A smaller learning rate could get you closer to the minima but takes more time to reach the minima, a larger learning rate converges sooner but there is a chance that you could overshoot the minima.

Here alpha is the learning rate.

One more parameter is the number of iterations which determines how much iterations it will take for each data point.

If we are the left side of the U-shaped curve then the updated value from both m and b from which it will be subtracted is negative to increase the overall value and vice-versa for the right side that is the updated value is positive.

Implementation (code)

There are two ways to implement linear regression that is either make a custom function or by using the sci-kit learn library to import the linear regression model.

Let’s start with the custom function.

Using sci-kit learn we will import make_regression. It will create data for regression.

Regression for Dataset

Using the matplotlib library we can visualize the dataset.

Data Visualization

Custom model of Linear Regression

Fig (a) initializing and Defining Parameter
Cost function and Gradient Function

Visualize the Predicted Line after model training

Predicted Output

Using Sci-kit method-

You may find that the code is short and then we are using the custom model. We need to remember the mathematics behind this to change code according to the dataset because sometime the pre-defined library model may not give good accuracy than the custom model.

Importing Sci-kit library

Import Libraries

Visualize dataset

Data Visualization

After fitting the model we will visualize between the dataset and the predicted line.

Predicted Output

The same goes for the multi-variate regression but the feature increased in this and likelihood function prefer moreover cost function and rest are same.

— — — — — — — — — — — — — — -THANK YOU — — — — — — — — -

--

--