Using the LSTM API in TensorFlow (3/7)

Erik H
2 min readNov 18, 2016

--

Dear reader,

This article has been republished at Educaora and has also been open sourced. Unfortunately TensorFlow 2.0 changed the API so it is broken for later versions. Any help to make the tutorials up to date are greatly appreciated. I also recommend you looking into PyTorch.

In the previous post we modified our to code to use the TensorFlow native RNN API. Now we will go about to build a modification of a RNN that called a “Recurrent Neural Network with Long short-term memory” or RNN-LSTM. This architecture was pioneered by Jürgen Schmidhuber among others. One problem with the RNN when using long time-dependencies (truncated_backprop_length is large) is the “vanishing gradient problem”. One way to counter this is using a state that is “protected” and “selective”. The RNN-LSTM remembers, forgets and chooses what to pass on and output depending on the current state and input.

Since this primarily is a practical tutorial I won’t go into more detail about the theory, I recommend reading this article again, continue with the “Modern RNN architectures”. After you have done that read and look at the figures on this page. Notice that the last mentioned resource are using vector concatenation in their calculations.

In the previous article we didn’t have to allocate the internal weight matrix and bias, that was done by TensorFlow automatically “under the hood”. A LSTM RNN has many more “moving parts”, but by using the native API it will also be very simple.

Different state

A LSTM have a “cell state” and a “hidden state”, to account for this you need to remove _current_state on line 79 in the previous script and replace it with this:

TensorFlow uses a data structure called LSTMStateTuple internally for its LSTM:s, where the first element in the tuple is the cell state, and the second is the hidden state. So you need to change line 28 where the init_state is placeholders are declared to these lines:

Changing the forward pass is now straight forward, you just change the function call to create a LSTM and supply the initial state-tuple on line 38–39.

The states_series will be a list of hidden states as tensors, and current_state will be a LSTMStateTuple which shows both the hidden- and the cell state on the last time-step as shown below:

Outputs of the previous states and the last LSTMStateTuple

So the current_state returns the cell- and hidden state in a tuple. They should be separated after calculation and supplied to the placeholders in the run-function on line 90.

Whole program

This is the full code for creating a RNN with Long short-term memory.

Next step

In the next article we will create a multi-layered or “deep” recurrent neural network, also with long short-term memory.

--

--