Linear Regression Model for Beginners

Published in

AIMLTutorial

5 min readJul 31, 2024

When I started learning about machine learning, linear regression was one of the first things (after basics) that I learned.

Before that, I was really confused (to be true - quite scared) by these terms, but I can tell you now that it’s not as difficult as it may sound at first.

Everything is hard before it’s easy — Goethe J.W.

In this article, you will learn about basics of Linear Regression and the main steps when building a Linear Regression Model.

You’ll learn about :

What is Linear Regression
What is Supervised Learning ?
What are the steps for Linear Regression model?
Data Collection
Data Exploration & Pre Processing
Model Training
Model Evaluation
Model Interpretation
Model Tuning
Model Deployment and Model Monitoring.

Linear Regression Model Steps — Linear Regression Steps

What is Linear Regression ?

In simplest terms, Linear Regression is a statistical model which shows the linear relationships between two or more variables e.g. x & y.

Linear relationships can be understood as a way to determine how y changes when we change x.

Simple Linear Model is expressed using below equation.

y = mx + c

where y = dependent variable
x = independent variable
c is intercept
m is the slope

As you can see that with above equation, we can predict the value of y when we know m & c.

We use Linear Regression to predict the explanatory (or dependent) variable based on the response (independent) variables e.g. predicting house pricing based on no. of rooms, square ft, lot size etc.

Supervised Learning

In Machine Learning, when we are provided with both the independent variables (that may impact the target) and the dependent variables (target) and we train our model to learn from provided labeled data & make predictions on the unseen/new data, we call it Supervised Learning.

Since Linear regression is used to make predictions on such labeled data, it falls under Supervised Learning category.

Linear Regression Model Steps

First and the foremost step even before applying any algorithm or train the model is to understand the goal of the project.

What are you going to predict ?
What insights business is trying to get from this model ?

It’s best to keep the project goals in the mind throughout the model building process.

Data Collection

Collect all the data that has both independent variables and the dependent variables. You can extract the data from various sources such as databases, files etc.

# read data from csv
data = pd.read_csv("mydata.csv")

Data Exploration and Pre-Processing

In Data Exploration also known as EDA (Exploratory Data Analysis), you

Review the data — see data summary, understand independent & dependent variables
Handle duplicates, missing values
Create graphs to understand the variables (Univariate analysis) and their relationships with other variables (Multivariate analysis)
Identify patterns, outliers, correlation between variables

After this, we pre-process the data for Linear Regression algorithm.

Encode the categorical variables (label encoding/one hot encoding)

Model Building & Training

Model Building includes several steps:

Define independent variables & target variables

# defining the explanatory (independent) variable
X = data.drop('Sales', axis=1)

# defining the response (dependent) variable
y = data['Sales']

Split the dataset between training and test set. Training set is used to train the model and test set is used to make predictions (i.e. to test the model).

# splitting the data in 70:30 ratio for train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X,    # sindependent variables
    y,    # dependent variable
    test_size=0.30,    # specifying the size of the test set 
    random_state=42    # specifying a seed value 
)

Fit the Linear Regression model on the training set

# Linear Regression Model
model = LinearRegression()

# fit the model on training set
model.fit(X_train,y_train)

Make predictions on the test data

#model prediction
model.predict(X_test)

Model Evaluation

Linear regression models can be evaluated using several metrics such as:

RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
MAPE (Mean Absolute Percentage Error)
R-Squared
Adjusted R-Squared

Which metrics to choose, usually depend on the goal you want to achieve. Many times you’ll be reviewing R-squared which is also known as ‘Coefficient of Determination’.

You will evaluate the model’s performance on both the training set and the test set and compare the performance.

Residual Plots, Q-Q plots & Durbin-Watson test and some of the techniques that you can apply to evaluate the data.

Model Interpretation

Going back to y=mx+c equation, our Linear Regression model will help us find the slope‘m’ and the intercept ‘c’. For multiple linear regression, you will get multiple coefficients.

# printing the linear regression coefficients
print(
    "Slope:", lin_reg1.coef_,
    "Intercept:", lin_reg1.intercept_,
)

Based on the slope and the intercept, Linear Regression equation will be created and it will help understand the impact of different variables on the target variable.

This is when you may choose to drop some variables which may not have significant impact on predicting the target variable.

Model Tuning

Based on the performance metrics and model interpretation, you may need to fine tune the model further. This may include applying transformation to the data, dropping variables etc. You can also find the best model parameters using the Grid Search and Random Search.

Every time you make changes, you will need to review the model performance again on the training set and the test set.

Finally you will be able to choose a model that will be best suited for your goal.

Model Deployment

This is where your hard work pays off and you deploy the model to production.

Model Monitoring

Your model will not always give the same performance and will change as it learns from more data and make predictions. Monitoring the model performance overtime may provide insights into the performance degradation and will require making adjustments to the model.

To summarize, you learned about the basics of linear regression, supervised learning, main steps of Linear regression model — data collection, data exploration & pre-processing, model building & training, model interpretation, model tuning, model deployment and finally model monitoring.

I kept this article not very technical on-purpose as it’s mainly for beginners in Linear Regression. Want to get more technical about Linear Regression ? Read about LinearRegression on scikit-learn.