Tathagat Dasgupta
the ML blog
Published in
7 min readJun 29, 2018

--

Time-Series Analysis Using Recurrent Neural Networks in Tensorflow

Welcome back to click-bait for our next tutorial on the tensorflow series!

If you are a new visitor do check out our other tutorials here. So let’s get started!!

Show your support by subscribing to our newsletter!

In this post we will be discussing about what recurrent neural networks are and how do they function. Moreover, we will code out a simple time-series problem to better understand how a RNN works.

Recurrent neural networks is a deep neural neural network which has, as the name suggests, recurring inputs to the hidden layer i.e. the output from a hidden layer is fed back to itself. Take a look at this picture to better understand the structure-

This is a typical RNN cell. Mind that the images on the right are not multiple layers, but the same layer unrolled in time where the outputs are fed back into the hidden layer.Now, you must be wondering, why are we discussing recurrent neural networks at all. To understand this, you have to familiarize yourselves with the concept of neural memory.

Neural memory is the ability imparted to a model to retain the input from previous time steps when the input is sequential. In simple terms, when our problem is associated with a sequence of data such as a sentence or a time-series or the lyrics of a song, the model has to remember the previous states of the input to function.

Let’s consider a scenario, If you are asked to try the lyrics of a song from the middle? It would be a bit difficult right. It’s not because you don’t remember the song, but it’s simply because your brain is not conditioned to sing the lyrics in that order. So, the brain needs to know the previous lyrics to continue with the lyrics in the middle. This is called conditional memory and this is what makes RNN’s special.

Recurrent neural networks can remember the state of an input from previous time-steps which helps it to take a decision for the future time-step. Watch the animation below carefully, and make sure you understand it.(Shoutout to iamtrask blog.)

In this gif, we can see that there are four hidden layers, which are unrolled over four time-steps. In each time-step the hidden layers inculcate some features from the current input(red,green,purple) as well as their previous time-steps state(blue,red,green).

Now, there is a slight problem with this concept. The model is trained using the backpropagation algorithm, in which we accumulate the gradients from the output layer and pass it back throughout the entire network. As we sigmoid activation functions and the number of time-steps in real world problems are huge, the gradients tend to become infinitesimally small(almost ~0). This is called the vanishing gradient problem.

To overcome this, an improvement on RNN’s were developed and are called Long-short Term Memory networks(LSTM). LSTM cells were developed to solve the problem of gradients by introducing several gates which would help the model decide what information to keep and what to forget.

Basic RNN Cell
LSTM Cell

The image above is a basic RNN cell, where h(t-1) (subscript t:medium does not support sub-scripts!!LOL) is the output from the previous time-step and is being passed into the tanh activation along with the current input x(t). The maths behind this cell are as follows-

h(t)=h(t-1)*W(recurr) + (x(t)*W(feed) + b)

Now, if you notice the LSTM cell, it seems a lot more complicated than the RNN cell, but essentially it’s just a combination of four layers:

  1. Forget Gate Layer
  2. Store Gate Layer
  3. New Cell State Layer
  4. Output Layer

Also, we have a new variable C(t-1) which is the cell state of the previous cell.

The first layer is responsible for deciding what information to retain from the previous cell state and what information is to be forgotten or removed. This is a binary gate i.e. if f(t) =1 , we keep the information else if f(t) =0 we forget the information.

f(t)=sigmoid(W(f)*[h(t-1),x(t)]+b(f))

The second layer has an input gate using which we calculate another variable called new candidate values. The new candidate values are information which seem relevant are added to the cell state. These values are decided using the currently acquired input from the input gate.

i(t)=sigmoid(W(i)*[h(t-1),x(t)]+b(i))

C(t^)=tanh(W(c)*[h(t-1,x(t)]+b(c))

The third layer calculates the new cell state which is used to calculate the output of this time-step and is also passed on to the next cell. The new cell state is calculated using the information acquired from the previous two layers.

C(t)=f(t) * C(t-1) + i(t) * C(t^)

The output layer makes use of all this information gathered over the last three layers to produce an output which has features from the current time-step and also from several previous time-steps.

o(t)=sigmoid(W(o)*[h(t-1),x(t)]+b(o))

h(t)=o(t) * tanh(C(t))

Now, there are several variations of these LSTM cells which have been developed over the years. The latest in RNN is called the Gated Recurrent Unit(GRU).

But, enough with the theory. Let’s get on with some coding!!

We will be solving a simple problem where we are generating a sine wave and providing a batch of the sine wave to the RNN and asking it to predict the next value on the batch i.e. the value one time-step ahead. This is a really simple problem(this is a beginner’s tutorial) as we are only looking one time-step ahead, but the same implementation can be applied to predict data several time-steps ahead.

First we will create a class TimeSeriesData which we will use to generate the sinusoidal data points and also generate batches for training.

The __init__() defines x_data and y_true which are the training points on the x-axis and y-axis respectively.

Plotting the sinusoidal wave
sine wave

The next_batch() is a utility function which picks a random batch of points from the above generated data (sine wave). First, we create a random point random_start and convert that point as a point on the sinusoidal wave (ts_start). Next, we create the batch on x-axis and y-axis i.e. batch_ts and y_batch. The if-else conditional step allows the user to either return the sine wave for the current time-step and the next time-step or the current and previous time-steps along with the x-axis batch values as well.

Plotting the batch from the sine wave
Batch from sine wave

The np.flatten() is explicitly used to reshape our vector so that we can plot our points.

Now, we can plot the generated batch on top of the sine wave for better visualization.

Now, that we have our training data, we declare some variables and placeholders for convenience.

Notice, we have num_outputs=1 but the number of neurons is set to be 100. So, we use OutputProjectionWrapper() a utility function to wrap our output into a single output value. Next, we define the the type of RNN cell to be used and initiate the tf.nn.dynamic_rnn() to produce the results from our recurrent network. You can experiment with the cell type, but I am using the GRU cell for this tutorial.

Now, we define the loss, optimizer and the training function for our model.

We also need one more variables train_inst which represents the random training instance we will use for training.

Now, we simply run the session and print the loss after every 100 steps. We store the predictions in y_pred. Remember that we are considering 30 time-steps for this experiment.

To visualize our results we use the following block of code:

This is the final output visualization of our model. The training instance indicates the batch from the current time-step. The target represents the batch from the next time-step. And, the predictions are the points that were predicted by our model for the next time-step. So, essentially the closer your prediction points are to the target, the better your model will be.

Running the same code for less number of iterations or a slower learning rate may result in some outliers in the prediction points. This is the output if the iterations are reduced to 2000.

As you may notice, the initial prediction points are a little off. This indicates that our model couldn’t retain the information from 30 time-steps ago, that accurately.

You can use the same implementation to predict further points in the series by simply editing the if-else condition under the next_batch() method from our class, so that, it would loop over further time-steps and return the appropriate outputs.

That’s all!! Till next time.

--

--

Tathagat Dasgupta
the ML blog

Associate Consultant — Data Science at Infosys | Ex-Lead Data Scientist at Senquire Analytics | UC Irvine graduate