Linear Regression : Concept and Working

Published in

ML_with_Arpit_Pathak

6 min readJun 2, 2020

Hello readers , so here the core journey into the machine learning world is going to start . In this blog , we will try to understand the core concept of machine learning , the regression analysis and the working of Linear Regression algorithm in Machine learning .

Core Concept of ML

We all have a basic idea that a machine learning model is trained on some previous data and its outputs to predict over some future inputs of similar type of data . But how do the machine learning model get trained on the data ? How do it realize that the given data have that output which we have provided it to train ? Here comes the core behind all these questions .

The core concept of machine learning is to find out the bias and the weight . What the machine learning model do is that it takes the input values in which the data and its output is given , and find out the correlation between the input and the output values . The data over which we train the machine learning model contains the independent variables (X) and one output variable(y) . The machine learning model finds out how the X is correlated to y or how much the X affects the y . This is known as the weight i.e. , the correlation of X with the y . However , it is not at all possible that X will be 100% correlated with y . This is known as the bias i.e. , the value of the y when the weight or the correlation of X with y is zero .

Next thing is about the accuracy of the machine learning model . There is no real life-real data based model of machine learning in the whole world that can claim to have 100% accuracy . There are always flaws in the model , it can be precise but cannot be fully accurate . These flaws or errors in the model are known as the residuals or loss . The goal to a machine learning programmer is to attain a model that has minimum loss and maximum accuracy .

Regression Analysis

Regression analysis is a process to find out the relationship between the independent variable(s) and the dependent variable. It also helps to figure out the overall impact of the independent variable on the dependent variable

Let us take a simple example of a company with some employees . In that company , let the salaries of the employees are decided on the basis of their experience . So, in this case , the “experience” is an independent variable and the “salary” is a dependent variable . Hence , if we want to predict the salary of an employee in the future , we first have to find out the correlation weight and the bias of the experience on the salary . We have to find out how much impact the change in experience have on the salary of the employee .

There are many techniques used for regression analysis but all of them depend on the three basic metrics —

Number of Independent variables (one or more than one )
Shape of regression line (linear or curved)
Type of dependent variable (continuous or categorical)

Let us now see the first approach to train the machine learning model by using the linear regression .

Linear Regression

Linear regression is a type of approach that works by establishing a relationship between the independent variable(s) and the dependent variable by finding the best fit regression line .

Core Formula

The core formula that works in linear regression is —

— — — — — : y = b + wx : — — — — —

Here , ‘y’ is the dependent variable and ‘x’ is the dependent variable . This is a simple equation of a line that we have studied in statistics . If we plot a graph of this kind of equation then the graph comes up to be —

This figure explains the equation of y=b+wx . Here , w represents the coefficient , imact or the relation of changing values of x on y and b represents the value of y when there is no impact of the x on y .

The need of b can be explained by the above example of salaries(y) of employees on the basis of their experience(X) . If there is a freshers in the company with zero experience (X=0) , then if there will be the equation y=wx , then the salary given to that employee will be zero . But this is not at all reality , a fresher employee is also given a salary and hence that salary becomes the bias (b) in the equation .

Now , when we have understood the base equation used , let us break in to the core concept of linear regression .

Core Concept

The core concept of linear regression comes up with the type of dependent variable and the shape of the line we get in it . In linear regression , the dependent variable is always a continuous value and the shape of the regression line is always a straight line . Now let us see how the regression line is calculated .

In the above figure , we can see the points are plotted on the graph and a regression line is drawn . A regression line connects all the data points on a graph plot . But in the above figure , not every point falls on the regression line . The distance of each point from the regression line is known as the residual or loss or error .

The plotting of the regression line is a hit and trial method . First we create a regression line and try to connect as many points as possible . Then we calculate the loss or the error in the plotting by calculating sum of the absolute distance of each point from the regression line . This process continues till the least error is obtained and the perfect regression line is plotted .

To calculate the error in the regression line , we have three methods —

Mean Absolute Error ( MAE ) : The MAE method calculates the absolute difference of each value of y in the data with the mean of all those values, sum them up and divide them with the total number of values to find out the average or mean absolute error in the data .

2 . Mean Squared Error (MSE) : The MSE method calculates the square of difference of each value of y in the data from the mean of the data , sum them up and then divides them bythe total number of values to get the mean or the average error in the data .

3. Root Mean Squared Error (RMSE) : The RMSE method is similar to the MSE method except the final step in which root of MSE is calculated .

In most of the cases , MSE works better than the MAE in giving better predictions .

Types of Linear Regression

Based on the number of independent variables (X) , linear regression can be classified into 2 types —

Simple Linear Regression : Consists of only one independent variable (X).

y = b + wx

Multiple Linear Regression : Consists of more than one independent variables .

y = b + w1*x1 + w2*x2 + w3*x3 + …

GITHUB LINK TO THE PRACTICAL IMPLEMENTATION EXAMPLE :

CLICK HERE

So this is all about the theoretical concept of linear regression .Hope it was an informative one for you . Thank you for reading…!!!