Using the Multilayered LSTM API in TensorFlow (4/7)

In the previous article we learned how to use the TensorFlow API to create a Recurrent neural network with Long short-term memory. In this post we will make that architecture deep, introducing a LSTM with multiple layers.

One thing to notice is that for every layer of the network we will need a hidden state and a cell state. Typically the input to the next LSTM-layer will be the previous state for that particular layer as well as the hidden activations of the “lower” or previous layer. There is a good diagram in this article.

We could continue to store the states for each layer in many LSTMTuples, but that would require a lot of overhead. You can only input data to the placeholders trough the feed_dict as Python lists or Numpy arrays anyways (not as LSTMTuples) so we still would have to convert between the datatypes. Why not save the whole state for the network in a big tensor? In order to do this the first thing we want to do is to replace _current_cell_state and _current_hidden_state on line 81–82 with the more generic:

You also have to declare the new setting num_layers = 3 in the beginning of the file, but you may choose any number of layers. The “2” refers to the two states, cell- and hidden-state. So for each layer and each sample in a batch, we have both a cell state and a hidden state vector with the size state_size.

Now modify lines 93 to 103 (the run function and the separation of the state tuple) back to the original statement, since the state is now stored in a single tensor.

You can change these lines 28 to 30 in the previous post:

To a single placeholder containing the whole state.

Since the TensorFlow Multilayer-LSTM-API accepts the state as a tuple of LSTMTuples, we need to unpack the state state into this structure. For each layer in the state we then create a LSTMTuple stated, and put these in a tuple, as shown below. Add this just after the init_state placeholder.

The forward pass on lines 40 and 41 should be changed to this:

The multi-layered LSTM is created by first making a single LSMTCell, and then duplicating this cell in an array, supplying it to theMultiRNNCell API call. The forward pass uses the usual tf.nn.rnn, let’s print the output of this function, the states_series and current_state variables.

Output of the previous states and the last LSTMStateTuples

Take a look at the tensor names between single quotes, we see that the RNN is unrolled 15 times. In the states_series all outputs have the name “Cell2”, it means that we get the output of the last LSTM layer’s hidden state in the list. Furthermore the LSTMStateTuple in the current_state gives the whole state of all layers in the network. “Cell0” refers to the first layer, “Cell1” to the second and “Cell2” to the third and final layer, “h” and “c” refers to hidden- and cell state.

Whole program

This is the whole self-contained script, just copy and run.

Next step

In the next article we will speed up the graph creation by not splitting up our inputs and labels into a Python list.