Sales prediction using a Linear regression model.

Arnav Saxena
5 min readJun 22, 2022

--

Analyzing and anticipating the sales for the given budget for TV, radio, and newspapers.

Photo by Carlos Muza on Unsplash

Hola, in this project I created a prediction model for sales analysis. In this model, we need to feed the advertising budget of TV, radio, and newspapers to the model and the model will forecast the possible sales. For designing the model, the machine learning method I opted for is simple linear regression, and the programming was done in Jupyter notebook.

Dataset Description:

The advertising dataset captures the sales revenue generated with respect to advertisement costs across numerous platforms like radio, TV, and newspapers. Find my Kaggle notebook here.

Data:

Features variable :

  • TV: advertising dollars spent on TV.
  • Radio: advertising dollars spent on Radio.
  • Newspaper: advertising dollars spent on Newspaper.

Target variable:

Sales budget.

Step 1: Import the required libraries and dataset.

The dataset I chose for this exercise or program is in the form of CSV so, I used pd.read_csv from the panda's module as shown in the picture below dataset contains 4 columns named TV, radio, newspaper, and sales.

Step 2: Check for null values in the dataset and data inspection.

After the extraction of data, it’s time to check the dataset for null values and duplicate values.

Checking for null values in the dataset.
Data description
Data Information

Step 3: Exploratory Data Analysis (EDA).

In EDA we are gonna find the relationship between features and the target variables.

Scatterplot between TV and sales (EDA)
Scatterplot between radio and sales (EDA)
Scatterplot between newspaper and sales(EDA)

Distplot:

Displot is used to represent the univariate distribution of data(involving one variate or variable quantity) against the density.

Distplot for TV (EDA)
Distplot for radio (EDA)
Distplot for the newspaper (EDA)
Distplot for the sales (EDA)
Pair plot between TV, radio, and newspaper with respect to sales (EDA)
Heatmap (EDA)

Step 4: Statistical Tasks

Standard Deviation

Standard Deviation(std) is a function used to depict how much variation is from the mean.

Correlation

Correlation(corr) is a function used to identify the relationship between the variables.

Variance

Variance(var) is a function used to check the dispersion that takes into account the spread of all data points in a data set.

Mean

Mean returns the average of the dataset.

Median

The median calculates the middle value of the dataset.

Step 5: Linear regression model building and prediction.

Model Building and splitting dataset.
Linear Regression output for test and train data.
difference between actual data and predicted data
Accuracy of linear regression on the dataset.
Regression graph.

The linear regression graph is created by train data and the model line is shown by the blue line which is created using test data and predicted data as we can see most of the red dots are on the line, thus we can say that model has produced the best-fit line.

My Previous Articles:

Conclusion:

In a nutshell, TV advertising is the best for sales prediction. It’s a good starting point, especially when attempting to understand the relevance of python as well as statistics.

Finally…

I really hope this article has been a great read and a source of inspiration for everyone thinking to pursue a career in the field of data science.

Please Comment for suggestions and feedback. I am still learning. Please help me improve so that I could help you by upgrading my writing skills as well as knowledge and presenting myself to you in a much better way through my subsequent article releases.

Thank you and Happy coding :)

Photo by Pete Pedroza on Unsplash

--

--

Arnav Saxena

Data scientist, AI enthusiast, and self-help writer sharing insights on using data science and AI for good.