A basic introduction to Long Short-Term Memory Networks

Meghna Asthana PhD MSc DIC
Analytics Vidhya
Published in
3 min readApr 7, 2020

“You never change things by fighting the existing reality.
To change something, build a new model that makes the existing model obsolete.”
Buckminster Fuller

In the previous article of the series, we talked about the drawbacks of RNNs which constitutes vanishing and exploding gradients. These can be overcome by Long short-term memory (LSTM) networks.

LSTM networks are special RNN cells specifically designed to learn long-term dependencies which are lacking in RNNs. Inspired by the ability of the biological brain to remember long-term dependencies, the LSTM network attempts to incorporate it in an artificial network. In order to achieve this, it passes on a cell state at every time step which are represented by four gates.

Outline of LSTM architecture [1]

The first gate is the Forget Gate which decides the information that would be passed in the next step. It is calculated by element-wise multiplication with the Old Cell State and further applying a sigmoid activation.

Mathematical representation of Forget Gate f

The Input Gate and Scaled new candidate gate collectively determine the information which would be preserved for the next state. The Input Gate (sigmoid function) is multiplied to the New Candidate gate (tanh function) to create a temporary tensor, Scaled new candidate. This approach helps prevent the vanishing gradient problem.

Mathematical representation of Input Gate i and New Candidate Gate g

The New Cell State is calculated by summing Scaled new candidate and Scaled Old Cell State so as to amplify the important features in the data. The final gate is the Output Gate which is a sigmoid function. The output generation process includes passing New Cell State through a tanh function and performing element-wise multiplication with the Output Gate to obtain the Hidden State. This is passed through a non-linear combination of a sigmoid and Softmax function to generate the final output [2].

Mathematical representation of Output Gate o

This concludes our series on Artificial Neural Networks. If you have made it till here, thank you for supporting my work. I will be writing content in other niches too so stay tuned!

[1] Kostadinov, S. & Safari, an O’Reilly Media Company, 2018. Recurrent Neural Networks with Python Quick Start Guide 1st ed., Packt Publishing.

[2] Julian, D., 2018. Deep learning with Pytorch quick start guide: learn to train and deploy neural network models in Python, Birmingham: Packt.

--

--

Meghna Asthana PhD MSc DIC
Analytics Vidhya

Computer Vision for Earth Observation @ Turing | CEFAS | BAS | NHM | UniCam | NERC