GSoC 2018: Backpropagation through time in LSTM Network — Part IV

Working of backpropagation process inside a LSTM cell.

In this blogpost, I’ll be discussing about how backpropagation through time works in LSTM cell and how the current implementation of it has been done in TMVA for LSTM layer design. We call it ‘backpropagation through time’ because we’re repeating this process at each timestep. If observed carefully, this is similar to normal backpropagation process.

Here is the pictorial representation of LSTM cell:

The pictorial representation of LSTM cell with four interacting gates.

The LSTM Cell backpropagation is different from RNN because in each timestep during backpropagation, the values of input gate, forget gate and output gate should also be updated. Here are the mathematical equations for reference:

Updating each gate value during backpropagation process in LSTM cell.
Final parameters update during backward pass in LSTM cell. dW represents input weights, dU represents state weights and db as biases.

The current implementation follows the final update of internal parameters: state weights, input weights and biases. The implementation of each gate value update is still in progress since it requires many parameters to be passed from Backward() method and current work forward pass feature is giving high error during testing session which is directly affecting the backpropagation feature of LSTM.

Extras:

  • Above ⨀ is the element-wise product or Hadamard product.
  • Inner products will be represented as ⋅
  • Outer products will be respresented as ⨂

The backward pass process in TBasicLSTMLayer class has been implemented using Backward() method to initialise tensor variables to store gradient values related to hidden state, input gate, forget gate, output gate and candidate gradient value. These values are passed to CellBackward() to perform final update of such parameters.

Resource: https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9. This is quite a good numerical example for LSTM to verify layer design. The current work only supports CPU architecture.

In my next story, I’ll be sharing the results of LSTM layer and the normal tests are working with forward and backward pass.