Time Series and Prediction (Part-2)

A Comprehensive Study of Time Series and Prediction with and without deep learning algorithms.

Tirth1306
Analytics Vidhya
12 min readJul 16, 2020

--

Cheers! you made it here. This is my second article on Time Series and prediction. If you have not read the first one then please read that article here. In this part, we are going to apply some deep Learning algorithms to our time series. First of all, we have to divide our data into features and labels.

Preparing features and labels

features and labels (Source — Coursera)

In this case, our feature is effectively several values in the series, with our label being the next value. We’ll call that number of values that will treat as our feature, the window size, where we’re taking a window of the data, and training an ML model to predict the next value. So for example, if we take our time-series data, say, 30 days at a time, we’ll use 30 values as the feature and the next value is the label. Then over time, we’ll train a neural network to match the 30 features to the single label.

preparing window dataset

So let’s, for example, use the tf.data.Dataset class to create some data for us, we’ll make a range of 10 values. We’ll use the dataset.window to expand our data set using windowing. Its parameters are the size of the window and how much we want to shift by each time. So we will set a window size of 5 with a shift of 1. We can do that with an additional parameter on the window called drop_remainder. And if we set this to true, it will truncate the data by dropping all of the remainders. Namely, this means it will only give us windows of five items. The good news is that converting to a NumPy array is super easy, we just have to call the .numpy method on each item in the data set. Okay, the next task is to split the data into features and labels. For each item in the list, it kind of makes sense to have all of the values except the last one to be the feature, and then the last one can be the label. Also, you would shuffle their data before training. And this is possible using the shuffle method. Finally, we can look at batching the data, and this is done with the batch method. It’ll take a size parameter, and in this case, it’s 2. So, we’ll see this output.

Applying On my Synthetic Data — The first step will be to create a dataset from the series using a tf.data.dataset. And we’ll pass the series to it using its from_tensor_slices method. We will then use the window method of the dataset based on our window_size to slice the data up into the appropriate windows. Each one being shifted by one-time set. We’ll keep them all the same size by setting drop remainder to true. We then flatten the data out to make it easier to work with. Once it’s flattened, it’s easy to shuffle it. You call a shuffle and you pass it the shuffle buffer. Using shuffle buffer speeds things up a bit. So for example, if you have 100,000 items in your dataset, but you set the buffer to a thousand. It will just fill the buffer with the first thousand elements, pick one of them at random. And then it will replace that with the 1,000 and first element before randomly picking again, and so on. This way with super large datasets, the random element choosing can choose from a smaller number which effectively speeds things up. The shuffled dataset is then split into the ‘x’, which are all of the elements except the last, and the ‘y’ which is the last element.

Code to generate window_datset from our synthetic data

Prediction

We create an empty list of forecasts and then iterate over the series taking slices and window size, predicting them, and adding the results to the forecast list. We had split our time series into training and testing data taking everything before a certain time is training and the rest is validation. So we’ll just take the forecasts after the split time and load them into a NumPy array for charting.

code for prediction next values
  1. Using Single-Layer Neural Network — Now that we have a window datasets, we can start training neural networks with it. Let’s start with a super simple one that’s effectively a linear regression. We’ll measure its accuracy, and then we’ll work from there to improve that.
training our model with a single neural network (window_size=20)
prediction Using Single-Layer Neural Network

We’ll then define the model, compile it, and fit it to the data that we generated. When it’s done it will print out the layer weights. You can see the layer weights which provide the coefficients for the x values as the linear regression, as well as the bias value for their regression here. We got this by calling the get weights method on the layer. We can now plot the predictions for each of the elements in the validation split of the series, the predictions are in orange, and the actual values are in blue. Finally, we can measure the mean absolute error between the valid data and the predicted results. I got a mean absolute error of 5.2505198.

2. Using Deep Neural Network (DNN) — Now let’s take that to the next step with a DNN to see if we can improve our model accuracy. It’s not that much different from the linear regression model we saw earlier. And this is a relatively simple deep neural network that has three layers. So let’s unpack it line by line. I have kept my model simple with three layers of 10, 10, and 1 neurons. The input shape is the size of the window and we’ll activate each layer using a relu function. We’ll then compile the model as before with a mean squared error loss function and stochastic gradient descent optimizer. Finally, we’ll fit the model over 100 epochs, and after a few seconds of training, we’ll see results that look like this. It’s pretty good still. And when we calculate the mean absolute error i.e. 4.567452, we’re lower than we were earlier, so it’s a step in the right direction.
Refer to Explore more on DNN.

training our model with the deep neural network with 3 layers

But it’s also a somewhat a stab in the dark, particularly with the optimizer function. Wouldn’t it be nice if we could pick the optimal learning rate instead of the one that we chose? We might learn more efficiently and build a better model. Now let’s look at a technique that uses callbacks. I’ve added a callback to tweak the learning rate using a learning rate scheduler. You can see that code below. This function will be called at the callback at the end of each epoch. What it will do is change the learning rates to a value based on the epoch number. So in epoch 1, the learning rate is 1e-8 * 10**(1/20). And by the time we reach the 100 epoch, it’ll be 1e-8 * 10**(100/20). This will happen on each callback because we set it in the callbacks parameter of the model.fit.

code to change the learning rate after every epoch
loss per epoch against the learning rate per epoch

After training with this, we can then plot the loss per epoch against the learning rate per epoch by using the below code, and we’ll see a chart like this. The y-axis shows us the loss for that epoch and the x-axis shows us the learning rate. We can then try to pick the lowest point of the curve where it’s still relatively stable like this, and that’s right around 7 * (10 **-6).

plotting loss against different learning rate

So let’s set that to be our learning rate and then we’ll retrain. So here’s the same neural network code, and we’ve updated the learning rate, so we’ll also train it for a bit longer. Let’s check the results after training for 500 epochs.

loss per epoch against the number of epoch

The first inspection looks like we’re probably wasting our time training beyond maybe only 10 epochs, but it’s somewhat skewed by the fact that the earlier losses were so high. If we cropped them off and plot the loss for epochs after 10 epoch then the chart will tell us a different story.

zoom image

We can see that the loss was continuing to decrease even after 500 epochs. And that shows that our network is learning very well indeed. And the mean absolute error i.e. 4.48277 is significantly lower than earlier.

3. Using Recurrent Neural Network (RNN) — A Recurrent Neural Network, or RNN is a neural network that contains recurrent layers. These are designed to sequentially processes the sequence of inputs. RNNs are pretty flexible, able to process all kinds of sequences including predicting text. Here we’ll use them to process the time series. We will build an RNN that contains two recurrent layers and a final dense layer, which will serve as the output. With an RNN, you can feed it in batches of sequences, and it will output a batch of forecasts. One difference will be that the full input shape when using RNNs is three-dimensional. The first dimension will be the batch size, the second will be the timestamps, and the third is the dimensionality of the inputs at each time step. To know more about RNN please visit here.

To know more refer to these articles: Article1, Article2.
Refer to this link to implement RNN in TensorFlow.

RNN cell (Source — Coursera)

So for example, if we have a window size of 30 timestamps and we’re batching them in sizes of four, the shape will be 4 times 30 times 1, and each timestamp, the memory cell input will be a four by one matrix, like above. The cell will also take the input of the state matrix from the previous step. But of course in this case, in the first step, this will be zero. Now, in some cases, you might want to input a sequence, but you don’t want to output on and you just want to get a single vector for each instance in the batch. This is typically called a sequence to vector RNN. But in reality, all you do is ignore all of the outputs, except the last one. When using Keras in TensorFlow, this is the default behavior. So if you want the recurrent layer to output a sequence, you have to specify returns sequences equals true when creating the layer. You’ll need to do this when you stack one RNN layer on top of another.

Source — Coursera

So in my RNN, there are two recurrent layers, and the first has return_sequences=True set up. It will output a sequence which is fed to the next recurrent layer. The next layer does not have return_sequence that’s set to True, so this layer will give us output to a single dense layer.

Also, I’d like to add a couple of new layers to this, layers that use the Lambda type. So the first Lambda layer will be used to help us with our dimensionality. If you recall an RNN expects three-dimensions: batch size, the number of timestamps, and the series dimensionality. Using the Lambda, we just expand the array by one dimension. Similarly, if we scale up the outputs by 100 using the lambda function, we can help training. The default activation function in the RNN layers is tanh() which is the hyperbolic tangent activation. This outputs values between -1 and 1. Since the time series values are in that order usually in the 10s like the 40s, 50s, 60s, and 70s, then scaling up the outputs to the same ballpark can help us with learning.

training model using Recurrent Neural Network (RNN)
Prediction Using Recurrent Neural Network (RNN)

Here’s the code to train the neural network. I’ve set the optimal learning rates and I’ve picked 400 epochs for which to train. Once it’s trained, I can use it to forecast for the validation range and plot the results. On all my plot, I can see that my prediction isn’t too bad other than this plateau, which is going to impact my MAE in a bad way. But despite that my MAE is only about 6.41, so it’s not too bad.

4. Using Long Short-Term Memory (LSTM) — LSTMs is the cell state that keeps a state throughout the life of the training so that the state is passed from cell to cell, timestamp to timestamp, and it can be better maintained. This means that the data from earlier in the window can have a greater impact on the overall projection than in the case of RNNs. The state can also be bidirectional so that the state can move forward and backward. To know more about LSTMs click here.

Refer to implement LSTMs using Tensorflow.
Refer to this article to learn the structure of LSTMs.

code to train a model using Long Short-Term Memory (LSTM)
Prediction Using Long Short-Term Memory (LSTM)

First of all, is the tf.keras.backend.clear_session, and this clears any internal variables. That makes it easy for us to experiment without models impacting later versions of themselves. After the Lambda layer that expands the dimensions for us, I’ve added a single LSTM layer with 32 cells. I’ve also made a bidirectional to see the impact of that on a prediction. Now we will add the second layer and note that we had to set return sequences equal to true on the first one for this to work. We train on this and now we can see the chart. Now it’s tracking much better and closer to the original data. Maybe not keeping up with the sharp increase but at least it’s tracking close. It also gives us a mean average error of 5.28722 that’s a lot better and it’s showing that we’re heading in the right direction.

5. Using Convolution — It is a Deep Learning algorithm that can take in an input image, assign importance like learnable weights and biases to various objects/aspects in the image and be able to differentiate one from the other.

To go deep in convolution, please watch this video.

Or else refer to this Article.

code for training model using convolutions and LSTMs
Prediction Using Convolution

One important note is that we got rid of the Lambda layer that reshaped the input for us to work with the LSTM’s. So we’re specifying an input shape on the curve 1D here. This requires us to update the windowed_datasetet helper function that we’ve been working with all along. We’ll simply use tf.expand_ dims in the helper function to expand the dimensions of the series before we process it. Now, adding two Bi-directional LSTMs Layer upon convolution layer and then passing the output sequence to the dense layer we get Mean Absolute Error around 4.985901.

Now to reduce loss further, one hint is to explore the batch size and to make sure it’s appropriate for your data. So in this case it’s worth experimenting with different batch sizes. So, for example, experimented with different batch sizes both larger and smaller than the original 32, you can get better results. To know more about the appropriate batch size refer to this video.

So by combining CNNs and LSTMs we’ve been able to build our best model yet, despite some rough edges that could be refined.

Using Real Data ( Sunspot )

Source — cosmos.esa

I have done Sunspots prediction with the help of CSV data available on Kaggle. Check out my whole code on Github.

Tirth Patel — Computer Science and Engineering Student, Nirma University.

LinkedIn | Instagram | Github

--

--

Tirth1306
Analytics Vidhya

Machine Learning Enthusiastic | Graphic Designer | Freelancer | Entrepreneurship