ABCs of Recurrent Neural Networks
RNNs are designed in such a way that it learns from sequences of data by passing the hidden state from one step in the sequence to the next step in the sequence, combined with the input. LSTMs are an improvement for RNNs, and are used when the neural network needs to switch between remembering recent things, and things from long time ago.
Suppose there is a neural network which recognizes images and the following image is fitted. And the neural network guesses that the image is most likely a dog with 80% probability, a wolf with 15% probability and a goldfish with 5% probability.
But what if the image is actually a wolf ? How would a neural network know?
Let us assume there is a show and the previous image before the wolf was a bear and the previous one was a fox. So, in this case, this information is a hint that the last image is a wolf and not a dog. Each image is analysed with the same copy of a neural network. But output of the neural network is used as a part of the input of the next one which improves the accuracy.
Mathematically speaking, the vectors are combined in a linear function which will then be squished with an activation function. In this way the previous information is used and the final neural net will know that the show is about wild animals and this information is used to correctly predict that the image is of a wolf and not a dog. And this is how a neural network works.
But wait, it has a shortcoming.
Assuming the bear appeared a while ago and the two recent images are a sheep and a rabbit. Based on those two the network does not know if the new image is a dog or a wolf. Since sheep and rabbit are just associated to both domestic and wild animals. So, the information about being in the forest comes all the way back from the bear.
But information getting in is repeatedly squished by sigmoid functions and even worse training a network using backpropagation all the way back, will cause problems such as vanishing gradient. At this point most of the information about the bear has been lost as the memory that is stored is normally short term memory. This is where LSTMs (Long Short Term Memory) plays an important role.
Summarizing, how an RNN works:
Memory comes in and combines with a current event.
Output comes out as a prediction of what is the input
Output also comes as a part of the input for the next iteration of the neural network.
References :
Udacity