Published in

Understanding a Linear Regression Algorithm with Example.

source- pinterest

DataSet and Notebook used in this article can be found here:

Let’s start on Linear Regression with a few scenarios:

  • Finance companies predicting the top factors that cause a customer to default on a loan.
  • Sports companies analyzing which variations of training have an effect on player performance.
  • Factors affecting the economic growth of a country.
  • Predicting stock prices.
Linear Regression Graph
  • Using historical data we plot the time someone spends in an app on the x-axis, and how much they also spent on purchases via the app on the y-axis.
  • We then plot a line that is as close to all our points as much as possible called the line of best fit.
  • We will then use this line of best of it to help us predict the amount. If we have an observation that is not in the line of best fit, then we will extrapolate this line to find our predicted value.

Equation Breakdown


  • Decide whether, in their new marketing and financial strategy whether to focus their efforts and resources on their mobile app experience or their website.
  • Identify what factors influence customer yearly spend most.

# Detecting Continous variables

# We then need to check for missing values

  • Getting rid of customers with a lot of missing values in their columns.
  • Getting rid of the whole attribute or remove the whole column.
  • Setting the missing values to some value (zero, the mean, the median, etc.).

# Detecting the Target Variable

# Detecting the Linear Relationship

# Detecting Outliers

# Checking for normal distribution of our datapoints.

# Detecting Correlation

# Order our independent variables in order of correlation to the dependent variable.

# Checking the Correlation between the Independent variables

# Defining our Independent variables X and Dependent Variable y

# Split Data into Training and Test Data

# Normalize the data

# Train the model on training data(fitting the model)

# Using Stats Model

import statsmodels.api as sm

# Summary of model performance using statsmodel


# Conclusion for our coefficients and business problem?

# Hypothesis testing and P- value

# What else can we do to make our model performance better? Answer: Hyperparameter tuning



Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ 🔵 Follow to join our 28K+ Unique DAILY Readers 🟠

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joan Ngugi

Big Data & Analytics, Data Science, Machine Learning, Data Engineering |