Recurrent / LSTM layers explained in a simple way

A part of series about different types of layers in neural networks

Assaad MOAWAD
Dec 4, 2019 · 4 min read
Image for post
Image for post

This post is meant to be read after:

Introduction

For all the previously introduced layers, the same output will be generated if we repeat the same input several times. For instance, if we have a linear layer with f(x)=2.x. Each time we ask to predict f(3) we will get 6. So if we ask 10 times in a row, predict us the output when the input is 3, the NN will always give 6:

F(3)=6; F(3)=6; F(3)=6; F(3)=6; F(3)=6; …

Now imagine we are training an algorithm to detect repetitions, so we want that F(3) = 0 for the first time (no repetition detected), then we would like to get F(3)= 1 for the second time. We can’t achieve this behavior with non-recurrent layers. Since by definition we will always get the same output for the same input. A hack solution for this is to take a vector of 2 variables, so we can treat the first variable differently than the second variable. So a F([3;0]) =0 (no repetition is detected) but F([3;3])=1 (repetition is detected). The downside of this hack is that we can operate only on a predefined fixed sequence length.

Solution

In order to solve the previously introduced problem, recurrent layers were invented. They are a family of layers that contain an internal active state. In the simplest form, we can write Recurrent layers in the following way:

Image for post
Image for post
  • H is a hidden active internal state that starts usually with 0.
  • f is a function that updates the internal state between sequence steps.
  • g is another function that uses the current internal state to calculate the output.
  • After each input X, H is updated using f, then the output Y of the recurrent NN is generated from this updated state using g.
  • So when we send several time the same input X=3 we might get different output Y, because the internal state is changing after each time.

For example, let’s consider the following simple example:

Image for post
Image for post
  • First we start with H = 0.
  • After the First X=3, we get H=3, and output Y=-3+5=2
  • After the second X=3, we get H=2*3+3=9 and Y=-9+5 = -4
  • After the third X=3, we get H=2*9+3=21 and Y=-21+5 = -16
  • If we reset H = 0, then we ask again for X=3 we get H=3, Y=2 again.

And so on, we can see easily here how we can get different outputs with the same input repeated. At any moment, we can reset the internal state (H=0) then the same sequence will be generated.

Basically recurrent networks behave like non-recurrent networks if we reset the internal state after each step (sequence length = 1).

Properties

For recurrent-networks, For the same input sequence, we will get the same output sequence (internal state is reset after each sequence, not input). While for non-recurrent networks, for the same input we get the same output.

Recurrent layers are very useful for everything related to sequencing. Where one element of the sequence itself is less important than its position in the sequence.

Consider text processing: If we have the letters: H,O,U,S if we want to predict the next letter, we will predict E. If we have the letters: L,I,S => we will predict T. Although both sequences have the letter S as the last known letter before prediction, but the history of the sequence is more important for the prediction than just the last letter. This sequence’s history is encoded in the internal state H, so when the letter S finally arrives, we will have different histories (or internal state) that will allow us to generate E as the next letter in the first case, and T the next letter in the second case.

If you enjoyed reading, follow us on: Facebook, Twitter, LinkedIn

DataThings

DataThings blog is where we post about our latest machine…

Assaad MOAWAD

Written by

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg

DataThings

DataThings blog is where we post about our latest machine learning, big data analytics and neural networks experiments. Feel free to visit our website: www.datathings.com

Assaad MOAWAD

Written by

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg

DataThings

DataThings blog is where we post about our latest machine learning, big data analytics and neural networks experiments. Feel free to visit our website: www.datathings.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store