LSTM — Introduction in simple words

Published in

Nerd For Tech

3 min readSep 19, 2020

LSTM — Long short term memory is an improvement over Recurrent Neural Network to address RNN’s failure to learn in the presence of past observations greater than 5–10 discrete time steps between relevant input events and target signals (vanishing/exploding gradient issue). LSTM does so by introducing a memory unit called “cell state”. Lets see the below diagram to understand LSTM’s basic building blocks.

In the above diagram the central tanh activation function with hidden state and input constitutes a basic RNN cell. LSTM adds other layer as improvement. In the below diagram the cell state is the horizontal line that runs through the top. This cells state is what gives “Long” in LSTM. This cell state carries information or context over longer discrete steps (up to hundreds).

The long term memory “cell state” behaves something similar to a conveyer belt in automated sorting machine, where parcels are added and removed.

Note there are only two updates happening to the “cell state”. This reduces the number of computations done. Hence giving stability and reducing chances of exploding descent.

In the below diagram we can see how an LSTM can be broken into layers for better understanding. We have three layers.

Forget layer: This layer filters or removes info/memory from previous cell state based on current input and previous hidden state. This is done via a sigmoid activation function. This function results only 0 and 1 for inputs. Once it is multiplied to something either it will drop that(multiplication with zero) results in zero or completely pass through(anything multiplied by 1 is same)

element-wise multiplication which results in filtering

Input Layer: This has again a forget logic, which removes any unwanted information from current input. We also have a modulator which keeps the values in between -1 and 1. This is achieved using a tanh activation function.

Output Layer: This layer takes current input and current cell state and then outputs the hidden state and cell output. Again we use scaling (tanh) for cell state to keep values in range -1 to 1.

Note: We also introduce different biases at different layers.

Mathematically each layer can be summarized by equations mentioned in the below diagram.

If you are interested in implementation of LSTM then you may want to go through this blog.

Happy learning!

LSTM — Introduction in simple words

Written by Amit Singh Rathore