Regression: Linear Regression

In this blog, we will discuss our first machine learning model i.e. Regression.

Deepanshu Anand
CodeX
5 min readJul 17, 2022

--

Photo by Mohammad Rahmani on Unsplash

Regression is a supervised learning technique to model the relationship between features (the independent variables in the data) and the target (the dependent variable in the data). It helps us to understand how the value of the dependent variable is changing with the independent variable. It predicts continuous values.

For example, we need to predict the temperature, we will take past data which will have independent variables like altitude, location, the month of the year, etc., and dependent variable temperature and then model a relationship between them and predict the temperature when the new set of independent variables is given.

Regression gives us a line or a curve by plotting dependent and independent variable and then give us a line or curve which fits the data points using which we make predictions about the data.

Regression

Underfitting

  • It is a condition in which the model is unable to find a relationship between the given dependent and independent values. It occurs due to a small amount of data

Overfitting

  • It is a condition when the model tries to fit each data point of the known data and due to this it is unable to perform on unseen data or the test data.

LINEAR REGRESSION

Linear regression is the simplest and most common supervised learning model which as the name suggests is a regression model but finds a linear relationship between the dependent(target) and independent variables(features).

It plots a linear line that tries to fit the data points when the dependent and independent variables are plotted on a cartesian plane.

Simple Linear Regression

Consider a case when there is only one independent variable(feature) and we need to predict a value(target), mathematically if we want to relate feature and target linearly we will do it as f(x)=mx + c

Here we will do a little manipulation, I can write f(x1) = (w1)(x1) + (w0) where x1 is the independent variable (feature), w1 is termed as the weight of the feature and w0 is the intercept which is obtained when the value of feature is zero.

Multiple Linear Regression

In this case there are more than one features and the model relates the target linearly as

f(x1, x2, x3, x4, …) = w0 + (w1)(x1) + (w2)(x2) + (w3)(x3) + …

here w1, w2, w3, … wn is the weight of each feature x1, x2, x3, … xn.

COST FUNCTION

As we know different values of w’s will give us different linear relations so it is our task to find out the best fit line which has the least error.

The cost function is used to optimize the coefficients or the weights (w) and give the measure of how our model is performing on data. For linear regression, we will use the Mean Squared Error cost function.

Residuals: it is the perpendicular distance between the actual value y and the predicted value f(x).

If the data points are far from the regression line then the residual will be high and so the error, and if the data points are closer to the regression line then the residual will be less and so the error and this is how we choose a regression line.

We use Mean Squared Error function to know the accuracy of the linear relation that the model has plotted. As its name suggests it is the mean of the square of residuals.

MSE= (squared sum of residuals) / N, where N is the total number of data points.

The line along which the Mean Squared Error will be less will be considered the best fit line and our model will be most accurate.

LINEAR REGRESSION USING PYTHON

We will use the Scikit Learn library, which is a freely available machine learning library that features various classification, regression, and clustering algorithms.

  • Importing the library and linear regression model
  • Importing tools for manipulating the dataset
  • For example, we have csv file
  • Training the model
  • Predicting the target values
  • Finding the accuracy
  • From the value of the root mean squared error we can determine the accuracy of our model

I hope you all understand what is linear regression and how we measure the accuracy of the model.

And stay connected to know more about Machine Learning models.

--

--

Deepanshu Anand
CodeX

Goodwill to all, I am Deepanshu Anand currently pursuing my B.tech in AI and Data Science. I am a cyber geek and share a major interest in MLOps.