Multivariate Time Series Forecasting using RNN(LSTM)

Soubhik Khankary
5 min readJan 27, 2022

--

I was trying to forecast the future values of a variable where it not only depends on the previous values of itself but it also depends on the previous/current values of the other variables. In that case we consider that as multivariate time series forecasting problem.

EXAMPLE:

Let us consider a shop which is trying to sell two different Indian snacks i.e. Samosa and Kachori. He wants to forecast the number of samosas he must prepare next day to fulfill the demands of the customers. In this case let me give you a realistic example.

Samosa(available-yes),kachori(available-yes):

Consider a customer who has come with an intention to buy 10 samosas but as kachoris were also available in the shop. He ended up ordering 5 samosa and 5 kachori . The sales of samosa dipped down because of the kachoris availability.

Samosa(available-yes),kachori(available-no)

Consider the same customer has come to the same shop with the intention of buying 5 samosas and 5 kachoris but because of the unavailability he ended up buying 10 samosas . The sales of samosas increased because of the unavailability of kachoris. The same could happen vice-versa.

In the case above the sales of samosa is not only dependent on its previous sales but also dependent on the current and past sales of kachori. Hence, it becomes multi-variate time series problem. Hope, it sounds clean and clear now.

PROBLEM STATEMENT:

Now we are going to solve the problem of forecasting the Open prices for stock of company ‘XYZ’ which is dependent on multiple other features like ‘High’, ‘Low’ & ‘Close’ price. In case if you are not clear with LSTM univariate please refer to the blog(https://medium.com/@soubhikkhankary28/univariate-time-series-forecasting-using-rnn-lstm-32702bd5cf4)

The Dataset is the same that I have used for forecasting future values. In the data above we will try to forecast the values for ‘Open price’ depending on other variables mentioned above. we have data from Jan 2012 to Dec 2016.

A quick look on the data set in excel first head():

Top rows in excel.

Please find the data set in excel for bottom rows(tail):

Bottom rows of the data set

Let’s start the coding with python.

Importing necessary basic modules

I have split the dataset for stocks into train and test dataset. Train dataset is the one where I would fit and embed the RNN layers and then test it on test data set.

Seems like the ‘Close’ column is of object type. We need to convert into float type. Here is the code for doing so:

In neural network it is very necessary to scale the values in data frame. I am using Standard Scaler in this case to scale values for my train data set. Please use two different scaler objects to scale the columns. One is for input and other one for output.

Now I have to create an array as I did in my previous post for the previous values for each column. As I have explained in detail about it in previous post. Please refer to the example to understand it easily.

Reshape the array into 3-d as I did below. It is necessary to fit in RNN model.

Now it has become easy to apply the RNN model with two embedded layers of LSTM layers and stack it with one dense layer.

SMALL EXPLANATION:

I have used two LSTM layers extensively and dropped out few neurons to make sure that my model doesn’t overfit . Please make sure to play around with different types of optimizers and loss if required . You should be able to write your own functions to hyper-tune the parameters.

Let’s fit the model. I have used 100 epochs and batch size to 32 for each epochs.

Preparation of Test Dataset:

Rescaling the pandas dataframe>

Reshaping the test data frame in similar way as we did for training dataset.

Try predicting the output for the Open Price and rescale it back to original value using inverse transform method.

Converted the predicted values Series into Dataframe as i did below:

Finally, Concatenating the actual and predicted values in same dataframe and trying to visualize the differences.

Plotting the actual and forecasted values together on the graph to visualize the differences.

I see the forecast was quite close to actual values . I feel we all did a great job at the end. Hope I was able to explain you all very well how this forecasting worked. I will be coming up with many other methods which would be easier to implement and use. Thanks again!!!

In case of questions, please comment and I will be happy to help you out.

--

--

Soubhik Khankary

Data Engineer by job , Teaching computers by stats and love to learn never endless math.