LSTM Made Easy
LSTM is one of the most interesting architecture in the Deep Learning field. It enables us to learn longer sequences.
In this post, we will be going through different gates involved in the LSTM, architecture, and implementing LSTM from scratch using Pytorch.
LSTMs makes small modifications to the information passed to it by multiplications and additions. With LSTMs, the information flows through a mechanism known as cell states. In this way, LSTMs can selectively remember or forget things perfectly compare to RNN and HMM.
The very first is to decide what information/knowledge we have to throw away from the cell state. This decision is made by a forget gate layer after applying a sigmoid function.
In the second step, we are going to decide what new information we need to store in the cell state. This starts with the input gate layer after the sigmoid layer is applied on top of it. It deals with updating values. A tanh layer is applied to new cell values, that could be added to the state. Lastly, we have combined these two to create an update to the cell state.
Finally, we are applying a sigmoid layer to the parameters of the output layer. This will tell us what we’re going to feed/show as output. Then, after applying tanh on cell state we have multiplied it with the output of the output gate layer.
The final equation before coding:
Let’s Code: Starting code from scratch is indeed important 👨💻.
Conclusion
LSTM has achieved remarkable results. The equation looks quite intimidating, but the step by step process has made them more approachable. Despite being computationally intensive LSTM has been one of the favorite techniques when it comes to text or even time series.
References: https://medium.com/turing-talks/turing-talks-27-modelos-de-predi%C3%A7%C3%A3o-lstm-df85d87ad210