Walmart Sales Time Series Forecasting using Deep Learning

Published in

Nerd For Tech

5 min readMay 30, 2021

This Blog covers different machine learning and deep learning models for the forecasting of Time Series Sales Data using different libraries like TensorFlow, Keras, pandas, sklearn, etc. You can find the complete code, models, plots, datasets here on my GitHub.

Walmart is an American multinational wholesale retail corporation. In 2014, Walmart released this dataset as a recruiting challenge, well I am pretty late for that but I am hopeful :)

Let’s go over a brief definition of Time Series —

Time Series

Time series is a series of data points recorded over even intervals in time. For e.g; Weather records, Sales records, Economic, Stock Market data, Rainfall Data, and much more. Just seeing the examples, you can also get an understanding of the importance of analysing time series and forecasting (predict) the data.

Dataset

The dataset is available on the Kaggle account of Walmart itself. Walmart Recruiting — Store Sales Forecasting can be downloaded from https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting.

The complete dataset is divided into three parts:-

train.csv — This is the historical training data, which covers to 2010–02–05 to 2012–11–01.
features.csv — This file contains additional data related to the store, department, and regional activity for the given dates.
stores.csv — This file contains anonymized information about the 45 stores, indicating the type and size of the store.

Machine Learning Models

Linear Regression Model
Random Forest Regression Model
K Neighbors Regression Model
XGBoost Regression Model
Keras Deep Neural Network Regressor Model

Data Preprocessing

First of all, we have to handle the missing values from the dataset.

Handling Missing Values

CPI, Unemployment of features dataset had 585 null values.
MarkDown1 had 4158 null values.
MarkDown2 had 5269 null values.
MarkDown3 had 4577 null values.
MarkDown4 had 4726 null values.
MarkDown5 had 4140 null values.

All missing values were filled using fillna() with the median of respective columns.

Merging Datasets

Main Dataset merged with stores dataset.
Resulting Dataset merged with features dataset.
Total 421570 data rows and 15 attributes.
Date column converted into the DateTime data type.
Set Date attribute as the index of the combined dataset.

Splitting Date Column

Using the Date column, three more columns are created Year, Month, Week

Aggregate Weekly Sales

The median, mean, max, min, std of weekly_sales are calculated and created as different columns.

Outlier Detection and Other abnormalities

Markdowns were summed into Total_MarkDown.
Outliers were removed using z-score.
After outliers removal, 375438 Data rows, and 20 columns.
Negative weekly sales were removed.
After removal, 374247 Data rows and 20 columns.

One-hot-encoding

Store, Dept, Type columns were one-hot-encoded using get_dummies() method.
After one-hot-encoding, no. of columns becomes 145.

Data Normalization

Numerical columns normalized using MinMaxScaler in the range 0 to 1.

Recursive Feature Elimination

Random Forest Regressor used to calculate feature ranks and importance with 23 estimators.
Features selected to retain-

mean, median, Week, Temperature, max, CPI, Fuel_Price, min, std, Unemployment, Month, Total_MarkDown, Dept_16, Dept_18, IsHoliday, Dept_3, Size, Dept_9, Year, Dept_11, Dept_1, Dept_5, Dept_56

No. of attributes after feature elimination — 24

Correlation Matrix represented as Heatmap

Splitting Dataset

Dataset was split into 80% for training and 20% for testing.
Target feature — Weekly_Sales

Linear Regression Model

Linear Regressor Accuracy — 92.28%
Mean Absolute Error — 0.030057
Mean Squared Error — 0.0034851
Root Mean Squared Error — 0.059
R2 — 0.9228
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Random Forest Regression Model

Random Forest Regressor Accuracy — 97.889%
Mean Absolute Error — 0.015522
Mean Squared Error — 0.000953
Root Mean Squared Error — 0.03087
R2 — 0.9788
n_estimators — 100
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False)

K Neighbors Regression Model

KNeigbhbors Regressor Accuracy — 91.9726%
Mean Absolute Error — 0.0331221
Mean Squared Error — 0.0036242
Root Mean Squared Error — 0.060202
R2 — 0.91992
Neighbors — 1
KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=1, p=2, weights='uniform')

XGBoost Regression Model

XGBoost Regressor Accuracy — 94.21152%
Mean Absolute Error — 0.0267718
Mean Squared Error — 0.0026134
Root Mean Squared Error — 0.05112
R2 — 0.94211
Learning Rate — 0.1
n_estimators — 100
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, importance_type='gain', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=None, subsample=1, verbosity=1)

Custom Deep Learning Keras Regressor

Deep Neural Network accuracy — 90.50328%
Mean Absolute Error — 0.033255
Mean Squared Error — 0.003867
Root Mean Squared Error — 0.06218
R2 — 0.9144106
Build using Keras wrapper on deep neural network
Kernel Initializer — normal
Optimizer — adam
Input layer with 23 dimensions and 64 output dimensions and activation function as relu
1 hidden layer with 32 nodes
Output layer with 1 node
Batch Size — 5000
Epochs — 100