Supervised Learning- Linear & Multiple Regression Algorithm

Helooooooooooooo………….! Today let’s cook Linear Regression.

Krushna Thakkar
Operations Research Bit
12 min readMay 13, 2024

--

Photo by C D-X on Unsplash

Q1: What is Linear Regression?

Linear regression analysis is used to predict the value of a variable in the labeled data based on the value of another variable. In linear regression, you have a dependent variable (the variable you are trying to predict) and one or more independent variables (the variables used to make predictions). The relationship between the dependent variable and each independent variable is assumed to be linear, meaning that a change in the independent variable is associated with a proportional change in the dependent variable.

Assume these are the types of taste a dish can result to
Let’s start with

R²: It is the percent of the variation of Y explained by X. The technique we used in Simple Linear Regression is Least Square Regression. It is a method used to estimate the parameters of a linear regression model by minimizing the sum of the squared differences between the observed values of the dependent variable and the values predicted by the linear equation. Minimizing the sum of squared differences leads to maximizing R². This means we need the best-fit line using Least Square Regression, which leads to the smallest square residuals which bring out high R², that follows a high success rate. (It will come again when it comes to Therotical Explanation).

Also, let me introduce Z-Score

Z-score: It is a preprocessing technique used to transform the features of a dataset so that they have a mean of 0 and a standard deviation of 1. This process ensures that all features are on the same scale. In linear regression, standardization is often applied to the features (independent variables) of the dataset. By standardizing the features, we ensure that they have comparable scales, which can help the optimization algorithm converge faster and prevent certain features from dominating the model simply due to their scale. MISCONCEPTION: Standardization doesn’t change the relationship between the features and the target variable (dependent variable) in linear regression. It only scales the features to have a mean of 0 and a standard deviation of 1, making it easier to interpret the coefficients of the regression model. (I will come again when it comes to Practical Explanation).

Formula:

Finally, we reach α(Learning Rate).

α: It is a crucial hyperparameter that determines the step size at which the model iteratively updates its parameters during the training process. The learning rate affects the convergence behavior of the algorithm. The learning rate is too small, the algorithm may converge very slowly, requiring a large number of iterations to reach the optimal solution. The learning rate is too large, the algorithm may overshoot the optimal solution and fail to converge, or it may oscillate around the minimum. Therefore we mostly choose value as 0.001. Selecting an appropriate learning rate is crucial for ensuring convergence to the optimal solution within a reasonable number of iterations.(I’ll go in-depth use of it in Convergence Algorithm)

Formula:

Another taste type bird is, MSE

Mean Absolute Error (MAE) is a metric used to evaluate the performance of a regression model. It measures the average absolute difference between the predicted values and the actual values in a dataset. Here’s how it’s calculated, for each data point in the dataset, find the absolute difference between the predicted value (denoted as y^​i​) and the actual value (denoted as yi​). Sum up all these absolute differences. Divide this sum by the total number of data points in the dataset to get the mean. A lower MAE indicates that the model is better at predicting the target variable

Formula:

Example: Education level explains 70% of the variability in income. Linear Regression is a much easier task and here are some straightforward steps:

1- Theoretically:

  • > We build a 2D-plot containing the x and y axis which states some variables like x- be height and y- be weight. Now, we draw a simple line passing through any data points.
  • > We find residual(distance between each point and line), if we square and add that up gets us R². Yes, we Calculate R²
  • > We again tilt the line and calculate R², compare it with the previous value, and see the difference if it is less, we tilt towards the direction or if it’s more we tilt away from the previous one.
  • > We keep doing it 4/5 times and after having some values calculated we at least have smaller amongst all what can be the least value and best line. Determining R² will tell us how good a guess our line would be.
  • > Plot the line in the form y= mx + c, where m is Slope and c is Y-intercept.

2- Practically: We still need some mathematical explanation to feel the Best Fit Line. For Practical I’ll crop and add Jupyter Notebook as well as Attach a link for it.

Let’s dive into Mathematical approach of Linear Algorithm:

Here are my notes:

According to this, we have
Plot + Formula + Little explanation of values in the formula.

The plot is between Weight and Height, the formula is hθ(X) = θ0 + θ1X1, where hθ(X), is a hypothesis function for Cost Function θ0, Intercept θ1, Slope, and X1 are the Independent variables.

Cost Function: It is a measure of how well a model performs relative to its predictions. In linear regression, the cost function typically measures the difference between the predicted values of the dependent variable and the actual observed values. The Mean Squared Error (MSE) is commonly chosen as the cost function in linear regression. WHY?

Because: It has a smooth and well-defined shape that facilitates optimization using techniques like gradient descent. This property ensures that optimization algorithms can find the global minimum efficiently. This global minima fits the best line. The MSE has a clear statistical interpretation as the average squared difference between the predicted and observed values of the dependent variable.

Let’s use a hypothetical dataset to feel what the Cost Function is:

Need an Explanation?

Here we have a very very very simple dataset and it starts from origin means (0,0) so, θ0 is 0, as it is a Y-intercept. Therefore we have,

hθ(x) = θX1 to solve.(Right Side picture)

Now, we change the value of slope or θ to gain hθ(x) and we plot those points in the graph with different colors and check the best-fit points. Afterward, we decide to create a Gradient Descent graph, using θ(i) VS hθ(x[i]). We iteratively update the value of θθ using gradient descent and plot the resulting ​(x[i]) values on the graph. The goal is to find the value of θ that minimizes the cost function (e.g., MSE) and brings ​(x[i]) closest to the actual Y values. We observe the graph of θ vs. ​(x[i]) and identify the point where the curve reaches the minimum value or touches the x-axis. This point represents the global minimum of the cost function and corresponds to the best-fitting line for the dataset. The best-fitting line obtained from the gradient descent process represents the linear regression algorithm for this dataset. This line serves as the model that predicts Y values based on the X values in the dataset.

Now this seems tedious where we have to change the slope again and again to find the values and gain insights by plotting the gradient descent graph finding the minima and then knowing the coordinates, therefore, we know Researchers won't sit idle and develop an algorithm called Convergence Algorithm.

Convergence Algorithm: The convergence algorithm defines when to stop the iterative process of gradient descent. It typically involves monitoring the change in the cost function between iterations. When the change becomes sufficiently small, indicating that the algorithm has converged, it stops.

Let’s dive into the Convergence Algorithm and its Math Intuition. After this finally, we’ll see some real implementation with Python.

Convergence Algorithm: It says, we loop until we find global minima which means it is the best-fit line.

Notes:

This is the best and easy to understand the Convergence algorithm and I think it not that much tricky to make it through.

Theoretical Conclusion:

1- Gradient Descent

2- Convergence Algorithm

3- Learning Rate

4- R²

5- Z Score

6- Cost Function

7- Least Square Method

Practical: Implementation with Jupyter.

How do we move? STEPS
Referring to my first Blog

I’ll Code From Scratch-> Very Basic (I’ll pick up some pace )

-> Imports: Pandas, Numpy, Matplotlib

-> Dataset Loading and Displaying: Pandas

-> Scatter Plot: Remember it as a Chopping board or base for our Regression problem.

-> Correlation Matrix: using Pearson Correlation

-> Here going to be some steps, I encourage to check Jupyter Notebook because Multiple Regression is what World will be asking and not solving Simple Problems. I am going to writing all steps, contains the flow of the algorithm for Simple Linear Algorithm following the earlier context.

1- Divide the data in dependent and Independent columns- (Height and Weight)

NOTE: Independent features should be in form of DataFram or 2D but not Series. On the other hand, target feature should be in a form of series because we need singular row and not column to predict the output valaue.

Example - df_indfeature1 = df[[‘Weight’]] = type(df_indfeature1)= Dataframe df_depnfeature1 = df[‘Height’] = type(df_depnfeature1)= Series

2- Apply Train Test Split

3- Preprocessing: Standardization: Z- Score Concept with StandardScalar

4- Apply Standardization to X_train data.

NOTE: We apply fit_transform() only to training set and transform() to Test datset as we have already calculated Z Score the for traning data then just transforming to testing data works better than doubling the process an again fitting the data and tranforming it to Standardization. This can lead to Data Leakage(I will cover this in future)

5- Apply Machine Learning Algorithm- Linear Regression

6- Fit the model- model.fit()

7- Evaluation: Predict the model

-> Coefficient or slope ad Intercept:

print("Coefficient or slope:",regression.coef_)
print("Intercept:",regression.intercept_)

# Answer = Coefficient or slope: [1.01654416]
# Intercept: 83.34382656470417

8- Best Fit Line

9- Let’s check Performance.

We cannot directly check performance like we apply train, test split or fit the model there are different metries to calculate different features of any machine learning model. Some of the metrics are — Accuracy, Precision, Recall, F1Score, Specificity, ROC-AUC, Confusion Matrix, MAE, MSE, RMSE, R Squared, MAPE, Explained Variance Score……etc.

The metrices used for Simple Linear Regression are MSE, RMSE, R² and MAPE. Due to our simple Regression Model we only need R² t preict the model’s performance.(W’ll go over other in Multiple Linear Regression right now )

Analysis LHS= Summation of Least Square Regression, RHS= Linear Regression Model

LHS

RHS: model.predict

LHS = RHS, this means our Least Square Regression compounded with Convergence algorithm turned out to be similar to the predictions for the Linear Regression Algorithm.

Performance Prediction with R²:

Let’s use OLS Linear Regression Technique

Multiple Regression Algorithm

Multiple regression is a statistical technique used to understand the relationship between a dependent variable and two or more independent variables. It extends the concept of simple linear regression, which involves predicting a dependent variable based on one independent variable, to cases where multiple independent variables are involved.

Code:

-> Imports: Pandas, Numpy, Matplotlib

-> Dataset Loading and Displaying: Pandas

-> Drop Columns: Numpy

We drop columns that are no use like Unnamed

->Check Null values in the datesetNumpy

Actually, we always deal with missing values bit it a basic example of Multiple linear Regression so we have no null values for now.

-> Visualize the data features

-> Correlation amogst each other feature variable.

-> Differentiate Independent and dependent variable

-> Apply Train-test split

-> Visualize features with target variable with Regplot(), Seaborn

-> Standardization , sklearn

-> Apply Linear Regression

-> Evaluate Performance Metrics

-> Check R² Score

Model Summary:

Interview Questions on Linear Regression:

Let’s move to last part of this post, the most important -> Interview Questions

Q1: What is linear regression, and how does it work?

Q2: What are the assumptions of a linear regression model?

Q3: What are outliers? How do you detect and treat them? How do you deal with outliers in a linear regression model?

Q4:How do you determine the best fit line for a linear regression model?

Q5: What is the difference between simple and multiple linear regression?

Q6: What is linear Regression Analysis?

Q7: What is multicollinearity and how does it affect linear regression analysis?

Q8: What is the difference between linear regression and logistic regression?

Q9: What are the common types of errors in linear regression analysis?

Q10: What is the difference between a dependent and independent variable in linear regression?

Q11: What is an interaction term in linear regression and how is it used?

Q12: What is the difference between biased and unbiased estimates in linear regression?

Q13: How do you measure the strength of a linear relationship between two variables?

Q14: What is the difference between linear regression and non-linear regression?

Q15: What are the common techniques used to improve the accuracy of a linear regression model?

Q16: What is a residual in linear regression and how is it used in model evaluation?

Q17: What is the difference between a parametric and non-parametric regression model?

Q18: What are the assumptions of the ordinary least squares method for linear regression?

Q19: How do you determine the significance of a predictor variable in a linear regression model?

Q20: What is the role of a dummy variable in linear regression analysis?

Q21: What is heteroscedasticity?

Q22: What is the difference between a categorical and continuous variable in linear regression?

Q23: What is the impact of correlated predictor variables on linear regression analysis?

Q24: How do you evaluate the goodness of fit of a linear regression model?

Q25: What is the role of a regression coefficient in linear regression analysis?

Q26: What is a prediction interval in linear regression and how is it used?

Q27: How to find RMSE and MSE?

Q28: How do you test for autocorrelation in a linear regression model?

Q29: What are the common challenges faced when building a linear regression model?

Q30: Can you explain the concept of collinearity and how it affects a linear regression model?

Q31: What is the importance of the F-test in a linear model?

Q32: What is the primary difference between R squared and adjusted R squared?

Q33: How would we implement Linear Regression Algorithm using SQL?

--

--