Linear Regression (LR) Model

Akshay singh
5 min readJun 17, 2019

In this article I’ll telling you some basic understanding regarding Linear Regression. Before giving you to a formal definition of Regresssion , first we understand what it is and how it s works. Let’s start with a practicle example. Look at this data set, it's related to CO_2 emissions from different cars. It includes engine size, number of cylinders, fuel consumption, and CO_2 emission from various automobile models. The question is, given this data set, can we predict the CO_2 emission of a car, using other fields such as engine size, or cylinders?

We can use regression methods to predict a continuous value such as CO_2 emission, using some other variables. Indeed, regression is the process of predicting a continuous value. In regression, there are two types of variables, a dependent variable( response variable ), and one or more independent variables(predictor variable).The key point in the regression, is that our dependent value should be continuous, and cannot be a discrete value. However, the independent variable or variables, can be measured on either a categorical, or continuous measurement scale. Basically, there are two types of regression models. Simple regression, and multiple regression. Simple regression is when one independent variable is used to estimate a dependent variable. It can be either linear, or non-linear.

For example, predicting CO_2 emission using the variable of engine size. Linearity of regression, is based on the nature of relationship between independent, and dependent variables. When more than one independent variable is present, the processes is called multiple linear regression.

So now move forward to understand how simple linear regression works !

Let’s take a look at this data set. It’s related to the Co2 emission of different cars. It includes engine size, cylinders, fuel consumption and Co2 emissions for various car models. The question is, given this data set, can we predict the Co2 emission of a car using another field such as engine size? Quite simply, yes.Linear regression is the approximation of a linear model used to describe the relationship between two or more variables. In simple linear regression, there are two variables, a dependent variable and an independent variable. For example, predicting Co2 emission using the engine size variable. When more than one independent variable is present the process is called multiple linear regression, for example, predicting Co2 emission using engine size and cylinders of cars. Our focus in this video is on simple linear regression.

Now question is how we know this data_set is well suited for simple linear regression. ? To understand linear regression, we can plot our variables here. We show engine size as an independent variable and emission as the target value that we would like to predict. A scatter plot clearly shows the relation between variables where changes in one variable explain or possibly cause changes in the other variable. Also, it indicates that these variables are linearly related.

With linear regression you can fit a line through the data. How do we use this line for prediction now?

Let us assume for a moment that the line is a good fit of the data. We can use it to predict the emission of an unknown car. For example, for a sample car with engine size 2.4, you can find the emission is 214. Now, let’s talk about what the fitting line actually is. We’re going to predict the target value y (CO2 emission ). In our case using the independent variable engine size represented by x1. Theta 0 and theta 1 are the parameters of the line that we must adjust. Theta 1 is known as the slope or gradient of the fitting line and theta 0 is known as the intercept.

Now takes a look to the figure shown, how to find best fit and theta 0 and theta 1.

here the mean of all residual errors shows how poorly the line fits with the whole data set. Mathematically it can be shown by the equation Mean Squared Error, shown as MSE. Our objective is to find a line where the mean of all these errors is minimized. To minimize this MSE equation we should find the best parameters theta 0 and theta 1.

now let’s see how calculate these parameters…

The xi and yi in the equation refer to the fact that we need to repeat these calculations across all values in our data set. And i refers to the ith value of x or y.X bar is the average value for the engine size in our data set. Please consider that we have nine rows here, rows 0 to 8. First we calculate the average of x1 and average of y, then we plug it into the slope equation to find theta 1.So these are the two parameters for the line, where theta 0 is also called the bias coefficient, and theta 1 is the coefficient for the Co2 emission column. Now the equation shown in the figure is called simple linear regression equation. Now we check our data_set to cross verify this model.

Now let’s plug in the 9th row of our data set and calculate the Co2 emission for a car with an engine size of 2.4. So Co2Emission = 125 + 39 x 2.4. Therefore, we can predict that the Co2Emission for this specific car would be 218.6.

Let’s talk a bit about why linear regression is so useful. Quite simply it is the most basic regression to use and understand.

Pros of linear regression:-

  • very fast
  • no parameter tuning
  • easy to understand and highly interpretable

Application of Regression:-

  • Sales forcasting
  • Satisfaction Analysis
  • Price Estimation
  • Employment income

Some frequently used regression algorithm:-

  • ordinal regression
  • poission regression
  • Linear ,polynomial regression
  • Decision forest regression
  • neural network regression
  • Bayesian linear regression
  • KNN (K-nearest neighbours)

Hope this article will help you out to understand how actually linear regression works ! To understand more, just go to the given link below to the same data_set that we’ve taken in above example, definitely it’ll help you to understand better with real data.

https://github.com/aks9639/Akshay_ML/blob/master/Akshay__Singh__/Day_7/Simple_linear_regression.ipynb

--

--