Part-II: Rossmann Pharmaceutical Sales Prediction: a Deep Learning Approach

7 min readSep 12, 2022

In my previous post we have seen the first part of Rossmann Pharmaceutical Sales Prediction: a Deep Learning Approach, and a little bit about LSTMs. Today in this blog, we will see Part II, The Deep Learning and LSTM implementation. before we are getting dive into the implementation, it is quite good to see a little bit about deep learning and how it works.

The objective of this post is to provide standalone examples of time series problem as a template that you can try with the data provided on the previous post.

After completing this tutorial, you will know:

What a Deep Learning is and how can be a network called a deep enough.
How to develop LSTM models for time series forecasting.

Have a nice reading!!!

Image shows How Deep Learning works. Source

However, if you are not new for Deep Learning skip this one and go directly to the practical one.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Deep Learning is also known as deep neural learning or deep neural network is a subset of machine learning methods in artificial intelligence (AI) that imitates the workings of the human brain based on artificial neural networks that are capable of learning unsupervised from data that is unstructured or unlabeled [7].

Even though there is no common universal convention about the number of layers to be called deep learning, it is distinguished from the more common one single-hidden-layer neural networks by its depth; that is, the number of node layers via which data must pass in a multistep process of pattern recognition. However, representations are usually deep (hence the buzzword deep learning): they are not created in a moment phenomenon, but stages from other shallower representations or layers. These layers may usually contain hundreds of neural units and the number of connections ranges in the thousands. Hence, it is important to raise questions like “What is the minimum number of layers in a deep neural network?” or “At which depth level does Shallow Learning end, and Deep Learning begin?”

Most researchers in the field agreed that deep learning has multiple nonlinear layers Mikel L. Forcada [8] and Hinton et al. [7] though most of the earlier versions of neural networks such as the first perceptron are shallow, which are composed of one input, one output, and at most one hidden layer. The researchers said there for more than three layers with input and output layers is called “deep” learning. So deep is not just arbitrary and a buzzword to make algorithms seem like for the down and too hard to understand. It is a strongly defined term with the meaning of more than one hidden layer. In deep learning networks, each layer of nodes trains on a distinct set of features based on the previous layer output. The more layers the neural nets have, the more complex the features the nodes can recognize. A Multi-Layer-Perceptron (MLP) with four or more layers (including input and output) is called a Deep Neural Network [6,7,8]. In addition to its name, a deep neural net has three levels of depth (Deep, very deep, and extremely deep)

Deep: According to Hinton et.al [9] one of the earliest deep neural networks has three densely connected hidden layers. After fine-tuning, a network with three hidden layers forms a very good generative model of machine translation tasks.

Very deep: According to Schmidhuber [10] considers Depth of Credit Assignment Paths (CAPs) 1 > 10 to be very deep learning.14 Whereas Simonyan et.al said, a very deep neural network has at least 16 hidden layers.

Extremely Deep: He et al. In 2016, the extremely deep residual networks consist of 50 up to 1000+ hidden layers [11]. Again, Schmidhuber [36] said both Feedforward Neural Networks (FNNs) (acyclic) and recurrent neural networks (RNNs) (cyclic) have won competitions. In a sense, RNNs are the deepest of all NNs15 they are general computers more powerful than FNNs, and can in principle create and process memories of arbitrary sequences of input patterns

I think the above description about Deep learning is more than enough for this post. It is just for beginner readers.
As well we’ll see the simplest way of implementing LSTMs just for beginners.

7. Building a Deep Learning models with sklearn pipelines (LSTM)

You need the following modules and libraries (imports) if you don’t install TensorFlow please install it before proceeding.

# Importing modules
import dvc.api
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from matplotlib import ticker
from statsmodels.tsa.stattools import adfuller, acf, pacf
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
import mlflow
from IPython.display import Markdown, display, Image
import warnings
warnings.filterwarnings('ignore')
sys.path.append(os.path.abspath(os.path.join('..')))

Once you have imported the required modules and libraries, you need to import the unlabeled data from DVC. (That I assume you have done everything related to DVC. If not install DVC and do all)

#Importing the collected Data
path = ‘data/train_store.csv’
repo = ‘https://github.com/amdework21/Rossmann-pharmaceutical-sales-prediction'
rev = ‘v3-scaled’
data_url = dvc.api.get_url(path=path, repo=repo, rev=rev)
scaled = pd.read_csv(data_url)

now let’s see the model ready data by running the scaled.head().

scaled.head()

We got 26 columns

The scaled data attributes and sample records

We have model ready training and test data sets; hence we now have to worry about the hyperparameters we’ll use. For instance, here I use:

Training hyperparameters used for the training

You can Adjust the hyperparameters asper your need and your computers capability.

Before going to training the model, we have to split our data into train and test data.

Splitting dataset into train and test sets

After you setup all, you just need to run this line to start train your model.

model1, his = t.train(EPOCHS)

Then you’ll get something like:

Showing model starting training with the given hyperparameters.

Wait till your model has completed its training. Then you will get a loss plot like the following.

A plot showing loss and val_loss for training and validation data respectively.

The next step is initializing the trained model(s) (here in our case the LSTM model)

# Initialize LSTM model
m = Sequential()
m.add(LSTM(units=50, return_sequences=True, input_shape=(xin.shape[1], 1)))
m.add(Dropout(0.2))
m.add(LSTM(units=50))
m.add(Dropout(0.2))
m.add(Dense(units=1))
m.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse', 'mae'])

Fitting the LSTM Model.

# Fit LSTM model
history = m.fit(xin, next_X, epochs=700, batch_size=50, verbose=0)

Prediction

using neural networks to predict which store(s) have a greater sale. Predictive modeling is often performed using curve and surface fitting, time series regression, or machine learning approaches. here we use the Long Short-term Memory as a ML predictive model.

If you run the following python script, you will get the testing related plot.

# Store "window" points as a sequence
xin = []
next_X1 = []
for i in range(window, len(Xtest)):
    xin.append(Xtest[i-window:i])
    next_X1.append(Xtest[i])# Reshape data to format for LSTM
xin, next_X1 = np.array(xin), np.array(next_X1)
xin = xin.reshape((xin.shape[0], xin.shape[1], 1))# Predict the next value (1 step ahead)
X_pred = m.predict(xin)# Plot prediction vs actual for test data
plt.figure(figsize=(20, 10))
plt.title('LSTM MODEL PREDICTING 1 STEP AHEAD')
plt.xticks(rotation=90)
plt.grid()
plt.plot(X_pred, ':', label='LSTM')
plt.plot(next_X1, '--', label='Actual')
plt.legend()

Final LSTM prediction on testing dataset

As you have seen above, the LSTM models great on time series data for prediction.
8. Conclusion

Long Short-Term Memory networks (LSTMs) can be applied to time series forecasting. There are many types of LSTM models that can be used for each specific type of time series forecasting problem. However, in this blog we have seen the basic one.

In this tutorial, you have seen somehow about deep learning, and LSTm implementation.

This is all about this blog.

Thank you for reading.
I’ll see you on the next post.
Have a nice time.

7. Reference

1. Forecasting Rossmann Store Leading 6-month Sales: https://cs229.stanford.edu/proj2015/192_report.pdf
2. Datasets: Rossmann Store Sales (kaggle.com)
3. Pharma sales data analysis and forecasting | Kaggle
4. Understanding LSTM Networks — colah’s blog
5. LSTM | Introduction to LSTM | Long Short Term Memor (analyticsvidhya.com)
6. https://towardsdatascience.com/intuitive-understanding-of-attention-mechanism-in-deep-learning-6c9482aecf4f
7. What Is Deep Learning? | How It Works, Techniques & Applications — MATLAB & Simulink (mathworks.com)
8. Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
9. Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets.” Neural computation 18.7 (2006): 1527–1554.
10. Schmidhuber, Jürgen. “Deep learning in neural networks: An overview.” Neural networks 61 (2015): 85–117.
11. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Part-II: Rossmann Pharmaceutical Sales Prediction: a Deep Learning Approach

7. Building a Deep Learning models with sklearn pipelines (LSTM)

Prediction

7. Reference

Written by Amdework Asefa

No responses yet