Published in

--

# What is Linear Regression?

Linear regression models are used to show or predict the relationship between two variables or factors. Linear regression is a fundamental and routinely used type of predictive analysis. The overall idea of regression is to examine two things:

(1) the closeness of prediction by Predictor variable and the outcome variable,(2) Which variables in particular are significant predictors of the outcome variable?

Now before diving deeper into what is Linear regression, let’s understand what is Regression in Machine learning algorithms. Regression is basically creating the a target value based on self sustained predictors. Depending on the amount of Self sustained(independent variables) and the relationship type between Independent and outcome(dependent variables), we apply different Regression techniques.

There are so many techniques because the model is so well studied. out of all techniques, Ordinary Least Squares common and Gradient Descent method are the most common techniques for studying and implementing.

Simple linear regression is a type of regression, where the number of independent variables is one and there is a linear relationship between the independent and the dependent variable, and is defined by the formula y = c + b*x, where y = evaluated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable.

The red line in the above graph is referred to as the best fit straight line. Based on the given data points, we try to plot a line that models the points the best. The motive of the linear regression algorithm is to find the best values c and b.

Ordinary Least Squares is used when we have more than one input for estimating the values of the coefficients. This means that given a regression line through the data we calculate the distance from each data point to the regression line, square it, and sum all of the squared errors together .It is unusual to implement the Ordinary Least Squares procedure yourself unless as an exercise in linear algebra. It is more likely that you will call a procedure in a linear algebra library. This procedure is very fast to calculate.

Gradient Descent is used when there are one or more inputs you can use a process of optimizing the values of the coefficients by iteratively minimizing the error of the model on your training data. The sum of the squared errors are calculated for each pair of input and output values. A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the error. The process is repeated until a minimum sum squared error is achieved or no further improvement is possible .Gradient descent is often taught using a linear regression model because it is relatively straightforward to understand. In practice, it is useful when you have a very large dataset either in the number of rows or the number of columns that may not fit into memory.

Regularization seek to both minimize the sum of the squared error of the model on the training data but also to reduce the complexity of the model. These methods are effective to use when there is collinearity in your input values and ordinary least squares would overfit the training data.

And finally for implementing the algorithm, we have two options, we can either use the scikit learn library to import the linear regression model and use it directly or we can write our own regression model based on the equations above. Let’s go ahead with using the Scikit library for importing the model. The github repository for the same will be added soon on my account.

However, Linear Regression is a very vast algorithm and it will be difficult to cover all of it. You can improve the model in various ways like, by detecting collinearity and by transforming predictors to fit nonlinear relationships. This article is to get you started with simple linear regression.

--

--