How to build a sales forecast in R

Anita Owens
Geek Culture
Published in
4 min readAug 22, 2021

--

Can we use linear regression to optimize daily sales of our supermarket product?

Photo by Yogendra Singh on Unsplash

We will use linear regression to determine the impact of shelf location on sales of Oreos. This dataset is from Wayne L. Winston’s Marketing Analytics: Data-Driven Techniques with Microsoft Excel. Chapter 10 is all about forecasting. If you are an analyst who works with any type of marketing data you should really check out his books! I will replicate the forecast exercise done originally in Excel, but in R.

Our Oreos dataset has 3 columns:
1. Observation number
2. Height in feet
3. Sales

Inspect Oreos dataset

We build our linear regression model with the lm function. We only have 1 dependent variable (Sales) and one independent variable (Height.in.feet). We will assume that Sales are in number of units. We will treat height as a factor variable which is a type of categorical variable. This will affect how we interpret the output of our regression model. We ignore the first column (Observation) for building our model.

Build linear regression model

We call the summary function on our model and interpret the output.

Print model summary
Model output

Intercept
The intercept is the value of the dependent variable (y) when all of the independent variables (x) are zero. In this case, since we have only one categorical variable as a predictor, the intercept is the mean value of Oreo sales at a shelf height of five feet (the reference group on the intercept).

Coefficients
A positive coefficient indicates that as the value of the independent variable increases (height), the mean of the dependent variable (sales) also tends to increase. Both of our coefficients are positive indicating positive sales.

At 5 feet height, sales are 28.5 units, all other variables remaining at the same level. Now we interpret our 6 and 7 feet height coefficients in relation to the 5 feet shelf location (on our intercept).

At 6 feet, sales increase to 34 units (an increase of 5), but at 7 feet sales are decreased to 15.75 units (a decrease of 12.75 units).

In summary, 6 feet is a better shelf height for our Oreos when it comes to sales.

P-values
P-values are < .05 for both of our coefficients. Both of our coefficients are statistically significant as indicated by the significant codes. Three stars for 6 feet and two stars for 7 feet.

R-square
Our regression model can explain about 90% of the variation in the unit sales of Oreos. This is a very high r-square for a model that only has one predictor (independent variable). In cases like this, proceed with caution.

Predictions
Can we use this output to predict Oreo sales based on shelf height? Yes we can! We simply use the predict function.

First, we create a data frame (in this case a tibble) of explanatory variables. We will use our original dataset.

Create data frame for our explanatory data

Then we call the predict function with our first argument being our model output, and the second argument our explanatory data.

Create predictions using predict function

What we end up with is a vector of our sales predictions.

Vector of our sales predictions

We will then add our forecasted sales and error column to our explanatory data frame and save to a new object called prediction_oreo.

Regression is a process that has errors and based on the estimated regression model, we can calculate the differences between the actual observed Y values (Sales), and the values predicted from our regression model (forecast_sales). These differences are called residuals so we will also add a column for our residuals called errors. Then, we print the results.

Prediction data frame

Finally, we will summarize our forecasted data grouping by our shelf location variable (Height.in.feet).

A summary of our forecasted data

Our forecast suggests that in order to maximize sales of our product, we should make sure they are located at 6 feet.

Recap

1. We used linear regression to explore the relationship between Oreo sales and shelf height.
2. We built a data frame to forecast sales based on shelf height.
3. We have concluded that shelf height has an impact on sales and that sales are optimal at 6 feet.

Full code can be found here: Github.

--

--

Anita Owens
Geek Culture

Analytics engineer, mentor and lecturer in analytics. The glue person bridging the gap between data and the business. https://www.linkedin.com/in/anitaowens/