In the previous article we learned how to use the TensorFlow API to create a Recurrent neural network with Long short-term memory. In this post we will make that architecture deep, introducing a LSTM with multiple layers.
One thing to notice is that for every layer of the network we will need a hidden state and a cell state. Typically the input to the next LSTM-layer will be the previous state for that particular layer as well as the hidden activations of the “lower” or previous layer. There is a good diagram in this article.
We could continue to store the states for each layer in many
LSTMTuples, but that would require a lot of overhead. You can only input data to the placeholders trough the
feed_dict as Python lists or Numpy arrays anyways (not as
LSTMTuples) so we still would have to convert between the datatypes. Why not save the whole state for the network in a big tensor? In order to do this the first thing we want to do is to replace
_current_hidden_state on line 81–82 with the more generic:
You also have to declare the new setting
num_layers = 3 in the beginning of the file, but you may choose any number of layers. The “2” refers to the two states, cell- and hidden-state. So for each layer and each sample in a batch, we have both a cell state and a hidden state vector with the size
Now modify lines 93 to 103 (the run function and the separation of the state tuple) back to the original statement, since the state is now stored in a single tensor.
You can change these lines 28 to 30 in the previous post:
To a single placeholder containing the whole state.
Since the TensorFlow Multilayer-LSTM-API accepts the state as a tuple of LSTMTuples, we need to unpack the state state into this structure. For each layer in the state we then create a
LSTMTuple stated, and put these in a tuple, as shown below. Add this just after the
The forward pass on lines 40 and 41 should be changed to this:
The multi-layered LSTM is created by first making a single
LSMTCell, and then duplicating this cell in an array, supplying it to the
MultiRNNCell API call. The forward pass uses the usual
tf.nn.rnn, let’s print the output of this function, the
Take a look at the tensor names between single quotes, we see that the RNN is unrolled 15 times. In the
states_series all outputs have the name “Cell2”, it means that we get the output of the last LSTM layer’s hidden state in the list. Furthermore the LSTMStateTuple in the
current_state gives the whole state of all layers in the network. “Cell0” refers to the first layer, “Cell1” to the second and “Cell2” to the third and final layer, “h” and “c” refers to hidden- and cell state.
This is the whole self-contained script, just copy and run.
In the next article we will speed up the graph creation by not splitting up our inputs and labels into a Python list.