Recurrent / LSTM layers explained in a simple way

A part of series about different types of layers in neural networks

Assaad MOAWAD
DataThings
Published in
4 min readDec 4, 2019

--

This post is meant to be read after:

Introduction

For all the previously introduced layers, the same output will be generated if we repeat the same input several times. For instance, if we have a linear layer with f(x)=2.x. Each time we ask to predict f(3) we will get 6. So if we ask 10 times in a row, predict us the output when the input is 3, the NN will always give 6:

F(3)=6; F(3)=6; F(3)=6; F(3)=6; F(3)=6; …

Now imagine we are training an algorithm to detect repetitions, so we want that F(3) = 0 for the first time (no repetition detected), then we would like to get F(3)= 1 for the second time. We can’t achieve this behavior with non-recurrent layers. Since by definition we will always get the same output for the same input. A hack solution for this is to take a vector of 2 variables, so we can treat the first variable differently than the second variable. So a F([3;0]) =0 (no repetition is detected) but F([3;3])=1 (repetition is detected). The downside of this hack is that we can operate only on a predefined fixed sequence length.

Solution

In order to solve the previously introduced problem, recurrent layers were invented. They are a family of layers that contain an internal active state. In the simplest form, we can write Recurrent layers in the following way:

  • H is a hidden active internal state that starts usually with 0.
  • f is a function that updates the internal state between sequence steps.
  • g is another function that uses the current internal state to calculate the output.
  • After each input X, H is updated using f, then the output Y of the recurrent NN is generated from this updated state using g.
  • So when we send several time the same input X=3 we might get different output Y, because the internal state is changing after each time.

--

--

Assaad MOAWAD
DataThings

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg