Understand Linear Regression in a simple way

Mayur Gargade
VisionNLP
Published in
4 min readJan 21, 2023

Hi I’m Mayur Gargade, Working as Data Scientist at VisionNLP https://medium.com/visionnlp

I will try to explain Simple Linear Regression in quite a simple way. Let us understand what the linear regression model is with the attached handwritten notes with an explanation .

I’ve studied linear regression for almost 4 years in my academics and now implementing the same model on real-world data.

What is Regression Analysis?

  • The statistical technique of estimating the unknown value of one variable (i.e.dependent target variable) from the known value of another variable (i.e. independent variables)
  • In other words, regression analysis helps us to understand how dependent variables change with changes in independent variables.
  • In machine learning, regression Analysis is a supervised learning technique that is used to predict numeric and continuous variables.

When we can use this model?

The linear Regression model is specifically used when your response variable (i.e. output variable) is in numeric data type.

Examples of Linear Regression:

  1. House price prediction (where the price is in numeric i.e. 1000.5$)
  2. Car price prediction and so on.

Difference Between Correlation and Regression:

Degree and nature of the relationship:
Correlation measure of the degree of relationship between X & Y.
Regression studies the nature of the relationship between the variables so that it may be able to predict the value of one variable based on other variables.
Prediction:
Correlation doesn’t help in making predictions.
Regression enables us to make predictions using the regression line.
Symmetric:
Correlation coefficients are symmetrical rxy = ryx.
Regression coefficients are not symmetrical i.e. bxy =! byx.

Types of Linear Regression

  • Simple Linear Regression
  • Multiple Linear Regression

E.g. If you want to increase sales then you have to increase the marketing cost as one independent variable(X). Here sales are the dependent variable(Y)

Simple Linear Regression

Now let's understand Simple Linear regression with me. Regression Line: The regression line shows the average relationship between two variables. It is also called a line of best fit. If two numerical variables are given X and Y, then there are two regression lines:

  • Regression line of X on Y
  • Regression line of Y on X.

In the attached image you can see that linear regression is always straight (not necessarily passing through the origin). The equation for simple linear regression (Y on X):

Y = b0 + b1x + e

Y is the dependent variable
- X is the independent Variable
- b1 is the coefficient of the independent variable (the slope coefficient)
- b0 is the intercept or biased term
- e is an error term

Ordinary Least Square Method:

In linear regression, our goal is to find out the best-fit line, Read the image notes carefully to understand how we can find the best-fit line using the cost function and the ordinary least square method in linear regression.

Model Performance:

Now we can calculate our best-fit line, which means we have fit our model for the given data, we must test our models with model matrices(performance matrices). For linear regression, we use the following matrices to see if our fitted model is good or not.

Additional Matrices to Evaluation linear regression model.

  1. Mean Absolute Error(MAE):-This is the simplest of all the metrics. It is measured by taking the average of the absolute difference between actual values and the predictions. The less the value of MAE the better the performance of your model.

2. Root Mean Square Error(RMSE):-The Root Mean Square Error is measured by taking the square root of the average of the squared difference between the prediction and the actual value. RMSE is a better performance metric as it squares the errors before taking the averages

Assumptions of Linear Regression:

Linear Relationship: A linear relationship is assumed between the dependent variable and the independent variables.

Random Error (e):
errors/residuals must be normally distributed with mean 0 and variance sigma. Random errors are independent.

Homoscedasticity: The variance around the regression is the same for all the Predicted values.

Multicollinearity: The absence of multicollinearity is assumed in the model, meaning that the independent variables are not too highly correlated with each other.

Multiple Linear Regression

Multiple linear regression is another type of linear regression, this is specifically used when we have more than 1 independent variable (i.e. input variables)

Ordinary Estimation for Multiple Linear Regression

Stay tuned to read more ML stuff.

Follow https://medium.com/@mayur_ml

--

--