From data preparation to parameter tuning using Tensorflow for training with RNNs

Padmaja Kulkarni
Analytics Vidhya
Published in
4 min readApr 16, 2020

--

Source: https://www.liberaldictionary.com/forecast/

The focus of this notebook to make you confident is data preparation and parameter tuning using Tensorflow.

Recurrent Neural Networks is a powerful tool for prediction when time-series data is involved. When I started first reading about them, I came across a blog post that said: “The best way to learn RNNs is to use them.” Directly going to the mathematical side of RNNs could be quite daunting. Although the Math is essential, being able to use the RNNs with tools like Tensorflow will indeed build confidence, and I would recommend that.

In this article, we will go through:

  1. Data Generation
  2. Data Preparation using Tensorflow
  3. Model training and parameter tuning
  4. Prediction

Let’s start! The complete code can be accessed from here.

  1. Data Generation

Before generating the dataset, let’s write imports.

Throughout this article, I am using a time series data I created in the following code. This data is seasonal data with a period of 90 days, and I have added some noise to it as well. And lets split our training and the test data.

Time-series data

2. Data Preparation

Once we have the data, let’s start preparing it from RNN.

RNN architecture takes into account temporal data and can use this to predict future Value. We have here a uni-variate time series, with only one variable, Value! With this data, now we want to create a time-series data, with m values form the past and let’s predict the immediate future Value, i.e., at m+1 time. Here, one data point is a series of (m X 1) values. Let’s do that using the following code. For this example, lets use the window size of 30. We will learn here how to use Tensorflow’s data function. But before the following key points are essential.

  1. To make sure that the algorithm does not learn from the data ordering, always shuffle the data.
  2. Try a couple of window sizes and take the one that works.
  3. Here, as we know that the data seasonality is 90 days, this window size or any larger window size than this should work.

Here is the code to get our data into a time series data format using Tensorflow. Lets first have a look at what various data APIs of TensorFlow manipulate our data.

Understanding Tensorflow’s data API

Note that in the function, timeSeriesDataset(), we add one to the window size. This is so that the extra data point will later be used as a future prediction value, as shown in the next lines of code. Here, we split the dataset into batches. It is recommended to have a batch size in the multiple of two, as given the memory design in your PC, it is likely to have better results. I am using 32, as our dataset is not very large.

Now, we have our training dataset ready!

3. Model training and parameter tuning

Lets train a basic RNN model. Here we use two layers of RNN with 64 units each. Note that the size of last layer of the dataset is same as our output variable size, i.e., 1. In the code, I am calling a function LearningRateScheduler which changes the learning rate based on epoch number.

Let’s see how the learning rate vs. loss graph looks like so that we can choose the optimal learning rate.

Graph Learning Rate vs Loss

Now, choosing a value of learning rate where the graph looks stable, we train the model again. Here, I am choosing 3x10–3.

Let’s write a code to see how our loss evolves!

We can plot the graph and see how, with the optimal learning rate, our loss goes down.

Mean absolute error for our model training

4. Prediction

Use the test series to predict and test the model.

Mean absolute error on the test set is: 4.3656874

Congratulations! You have successfully built and evaluated your first RNN model.

Now, you should try it on one of the real datasets, e.g., Solar flare data and see if you can predict using a similar model I used in this article. Good luck!

--

--