Linear Regression — Residual Plot Comparison for Ads data

DevTechie
DevTechie
Published in
4 min readJul 22, 2024

--

Problem statements: In our last article, we learned how we can evaluate if Linear Regression is the right choice for our dataset. In this and upcoming articles, we will dive deeper into each feature and see if Linear Regression is a right choice for each.

If we recall from our last article, visually if the residual plot looks random and there is no pattern forming either a straight line or parabolic we can say Linear Regression would be a good choice. Let’s go ahead and calculate mean_absolute_error and root_mean_squared_error along with analyzing residual plot and distribution graph to evaluate model performance for TV and Newspaper separately. Let’s get going.

TV

Import our libraries and load our data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

file_path = r'/Users/Downloads/advertising.csv'
df_ad_data = pd.read_csv(file_path)
df_ad_data.head()

Creating X matrix for TV data

X_tv = df_ad_data[["TV"]]
X_tv.head()

Get Y Vector Column

y_tv = df_ad_data['Sales']
y_tv.head()

Separate out our training data and test data

from sklearn.model_selection import train_test_split…

--

--