LSTM Made Easy

Shubham Deshmukh
3 min readJul 28, 2020

LSTM is one of the most interesting architecture in the Deep Learning field. It enables us to learn longer sequences.

In this post, we will be going through different gates involved in the LSTM, architecture, and implementing LSTM from scratch using Pytorch.

LSTMs makes small modifications to the information passed to it by multiplications and additions. With LSTMs, the information flows through a mechanism known as cell states. In this way, LSTMs can selectively remember or forget things perfectly compare to RNN and HMM.

Abbreviation used
Parameters used in the LSTM

The very first is to decide what information/knowledge we have to throw away from the cell state. This decision is made by a forget gate layer after applying a sigmoid function.

Forget gate
Forget gate- Source: https://medium.com/turing-talks/turing-talks-27-modelos-de-predi%C3%A7%C3%A3o-lstm-df85d87ad210

In the second step, we are going to decide what new information we need to store in the cell state. This starts with the input gate layer after the sigmoid layer is applied on top of it. It deals with updating values. A tanh layer is applied to new cell values, that could be added to the state. Lastly, we have combined these two to create an update to the cell state.

Input gate- Source: https://medium.com/turing-talks/turing-talks-27-modelos-de-predi%C3%A7%C3%A3o-lstm-df85d87ad210

Finally, we are applying a sigmoid layer to the parameters of the output layer. This will tell us what we’re going to feed/show as output. Then, after applying tanh on cell state we have multiplied it with the output of the output gate layer.

Output gate- Source: https://medium.com/turing-talks/turing-talks-27-modelos-de-predi%C3%A7%C3%A3o-lstm-df85d87ad210

The final equation before coding:

Final equation

Let’s Code: Starting code from scratch is indeed important 👨‍💻.

Conclusion

LSTM has achieved remarkable results. The equation looks quite intimidating, but the step by step process has made them more approachable. Despite being computationally intensive LSTM has been one of the favorite techniques when it comes to text or even time series.

References: https://medium.com/turing-talks/turing-talks-27-modelos-de-predi%C3%A7%C3%A3o-lstm-df85d87ad210

--

--