Retail Sales Analytics through Time series Forecast using RNN

Kevin MacIver
12 min readNov 29, 2019

--

Okay! Let’s do this..

So, I prepared this story to familiarize myself with RNN (Recurent Neural Networks) and Multivariate time series analysis.

The dataset is available in the following link:

As we can see the data is divided in three “csv” files:

  • Stores;
  • Features;
  • Sales;

Here is what a ERD (Entity relationship diagram) would look like

ERD for Retail data

So the first thing to do is to join the tables in order to have one table with all the data. To do that we can use any program that allows you to work with join functions for relational data, for example SQL, Python, R, Hive, etc.

For this example I used Hive, but since the data is not that big any other program should work fine.

Now that we have a single data table let’s begin our initial analysis and EDA. For that I used R.

As we can see, some of the attributes are not in the correct type.

Now let’s check how are the distribution of NaN values within the attributes.

As we can see the NaN values are distributed in the Markdown features. Here is were it get’s a bit tricky. As the data library has specified the Markdowns only began to be computed after Nov 2011, therefore most of the data don’t have markdowns.

Here is where I separate the analysis. First I’m going to analyse from Nov 2011 forward only to visualize any relationship of markdowns and weekly sales. Second I’m going to use the whole data (excluding the Markdown columns) and perform Machine Learning two generate a model to forecast future sales.

Analyzing Markdown Data

Even after sub-setting the data we can see that NaN values are still present.

So now we have the problem of what to do with these NaN values. One approach could be filling them using the mean, but since we are dealing with several Departments and Stores and weeks maybe this approach will be to naive for this case.

The data library states that Markdowns are known to affect sales, so let’s filter by Department and get the median of those values to fill the NaN.

We repeat the process for all Markdowns and the fill the NaN values

Okay, now the data is clean and we can begin to plot some visualizations.

The visualization above shows the distribution of the Weekly sales for all stores and departments followed by the Markdown applied in that week. We can see that in some cases when all Mardowns had high values, Weekly sales had some increase. Although, it’s not apparent enough to conclude that Markdowns have a high influence on the sales or not.

Other curious visualizations made were:

Total of 11 Departments contributed for 50% of the Revenue
Total of 14 Stores contributed for 50% of the Revenue

Analyzing Data for Forecasting Modeling

Now let’s deal with the data as a whole, dropping the Markdown columns.

I decided to reduce the scope of the analysis focusing primarily only in the data related to the Top Stores and Departments, i.e, those that contribute more to the final revenue of the company.

Before we begin dealing with preparing the data for modeling we need first to stop and think about our model and what we want to predict at the end. 🤔

The first thing to have in mind is that we are dealing with time series. Second, we need to think how store sales are influenced and what influence it, for example, can we do a single model that will predict independent of store and department? Or maybe we need a model for each store or one for each department?

We need to keep in mind that at the end of the day, Machine Learning algorithms for supervised problems are all about finding a generalization function. So maybe a unique model will have problems finding a unique generalization function that works for all stores and departments. Therefore, we could either generate a model for each store or for each department.

If we plot the correlation between stores for the same department and between departments within the same store we can see that departments tend to have a stronger correlation

Correlation of Weekly Sales of a Specific Department within each Store
Correlation of Weekly Sales of a Specific Store within each Department

Therefore, I’m thinking it’s better to run a model for each Department and then we will use these models to predict future revenue for each department and store.

Preparing the data

Considering what we decided earlier we need to subset and filter the data in a better way to be used later for generating each model. Therefore ,we group the data first by Department and the by store. So we end up with a nested list structure where we have Departments -> Stores -> Dataframe (containing the weekly sales and other information)

With the data prepared we can now exported to work on it on Python.

Loading Libraries and Preparing to read data
Reading data as Nested List

Now let’s begin to treat the data for the Neural Network. Since we are dealing with a Time Series we should use a specific type of Neural Network that takes into account the order of input data. For that we have the Recurrent Neural Network (RNN), Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM). To avoid making this story even longer I’m not going to get into the detail of each one. If you’re interested check out this material:

Also, would like to acknowledge Magnus Erik Hvass Pedersen who’s videos especially #23, helped me a lot in the python code.

So, getting back to the work here. After reading the data we need to create a target data.

We need to decide how many periods in the future should our model predict. If we look closely at the data we see that for each department and store we have 143 weekly sales entry. So if we wanted to predict one year a head we would need to shift our target data in one year (roughly 52 weeks). Doing that will reduce the data we have left to train and test, as we will see bellow.

We can see how the Shifted data move the Weekly values up for the first date

Now, since we shifted the target data there is NaN values at the end

As I mentioned before, in order to have the input and target values we need to copy create the target values by copying them from the input and then performing the shift operation. The amount of shift steps represent how long in the future you want your model to be able to predict based on the present data. Choosing a larger shift will also imply in “loosing” the amount of data you have to perform train and testing. As we can see above, I chose 12 weeks, once the number of weekly registers is not too high.

Now, to be able to use the data within the Neural Network we need to convert it to numpy arrays.

Now we can divide into train an test sets. Since we are dealing with a time series we can’t use a random split because that will break the time sequence of data. What we can do is separate some last periods of each time series as data for testing. For this analysis I chose the last 12 steps to be used as my test set.

Create Recurrent Neural Network

Loss Function

We will use Mean Squared Error (MSE) as the loss function. MSE measures how closely the model’s output matches the true output signals.

**From Magnus Erik Hvass Pedersen**

Compile, CallBacks and Optimizer

Okay now we need to create a generator that will generate the batches to be train by the model. Remember that at the end we want one model for each Department. For this story I’m only going to do one Department but the concept will just need to be repeated for the rest of them

Let’s create a function to generate train and test batches.

Important to remember that when we create the Batch for training and testing we need to normalize the data, that’s what the functions MinMaxScaler() are for.

As we can see I called the function train_dept instead of train_all_dept, since we are doing only one Department (“Dept 13”)
It took 21 epochs before Early Stop callback was activated

Visualizing Model Training and Validation

Left Graphs are for Train data and Right are for Test Data

Evaluating Model

Let’s run evaluate the model using RMSE (Root Mean Squared Error)

But how this values indicate how well the model did?

Well, we could compare it to other models or algorithms. Tableau has a forecast function, let’s compare Tableau’s results to our’s.

The first thing to do is filter out the data we want to forecast in Tableau, in order for the forecast period to match our forecast test period.

After that we need to select the Forecast model

Tableau Forecast Option

Let’s use Trend and Season as additive. After that we can see the RMSE in the “Describe Forecast” option.

Tableau Forecast Model Result

Now let’s compare Tableau’s RMSE with our model’s.

Let’s visualize the boxplot distribution of the RMSE

So we can conclude that our GRU model had better results when compared to Tableau’s Forecast function for this data. The boxplot also show a better stability since the range of values is also lower. 😁

Forecasting for Department

Now that our model is trained we want to forecast future revenue. With the data we have we can forecast 12 weeks ahead but if we want to go further then we need to have some extra information.

If we examine the data we see that for each store some information are fixed but other are also time-series (Temperature, CPI, Unemployment and Fuel Price). So if we want to predict further than 12 weeks we need to have the prediction for those data too, since they are input for our model.

We could run different models to find those predictions too but i decided to use Tableau forecast for CPI and Unemployment and used past values for the same date for Temperature and Fuel_Price. Once we generated those data we loaded them into Python too.

Now let’s use the whole data to run the first 12 week prediction

Create a Dataframe for visualization and future use

Create the functions too predict the values according to our trained model.

Now let’s create a support dataframe to generate the data for Hollidays and

First Quarter Forecast

Now that we have the prediction for the 12 week look ahead we need to complete the rest of the input data to be able to predict again 12 weeks in the future.

Visualize new data consisting of the 12 week prediction and the new input data from the rest of the input features

Concatenating the new prediction data above with the previous input data

We repeat this process until we reach the desired number of forecast periods, taking into account that precision will be lost as we go more into the future since errors will be mounting up.

Process for forecasting 4 quarters (48 weeks) a head

Results

At the end the final forecast of 48 weeks for Department 13 looks like this

48 week Forecast Result

Since there is no data for our forecast period we can’t really measure how well our forecast was.

Conclusions

The idea behind this story was to get to work a bit with time series and Recurrent Neural Networks. Our model apparently presented a good enough result during training and testing.

For the forecast we have to consider that, since we didn’t had a big enough time range in our training data, we had to decrease how far in the future our model could predict. By doing that, and later trying to forecast further in the future we end up predicting using our own prediction which will eventually loose accuracy since errors will be stacked up.

I found that Recurrent Neural Networks (by that i mean RNN, GRU, LSTM) are very fascinating tools to work with time series.

Recommendations

This story end up being very long but at the end there are many points where we could improve our analysis, here I will list some of them:

  • Performing the training for the rest of the Departments;
  • Use other Neural Architectures such as LSTM and see how they perform;
  • Use other Layer shapes;
  • Perform feature selection to see if we decrease our RMSE.

Hope everyone enjoy this ride!

Thanks Again!

--

--

Kevin MacIver

Driven for innovation, waiting for the robots uprising..