3 mins to understand LSTM — Long Short Term Memory

Oxygencube
2 min readMay 16, 2023

--

LSTM in 15 words

A recurrent neural network architecture in conjunction of with a gradient based algorithm (i.e. optimization).

Why I write this ?

With the success of ChatGPT, lots of researchers and prompt engineers discuss how to use it and get evolved in personal growth and job market revolution. The characteristic of NN is a black box (like algo trading in finance or Link Search method mentioned by Sergey and Page at 1998 (link) and now evolving dramatically under Moore’s law. I wrote this up to help myself as a revision for fundamental of neural networks

Main Component of LSTM

Assume 1 layer for NN, Recurrent neural network which get

  • Input to Node (Input Gate)
  • Process Node with Kernel, taking logistic regression as example which is a gradient based learning algorithm (Famous solution as Kalman Filter, Softmax and Sigmod etc)
  • Output is the sum up the loss and multiply reward in each nodes (Output Gate)
  • Output as a Feedback input for the loop
  • Time Lag implemented as forget gate where lag as vanishing gradient problem (Forget Gate)
Source : Udacity

Up to this point, the learning would be in single direction (from left to right). At Next Section, I would explore more how bi-directional LSTM work, which is the fundamental concept for building ChatGPT.

Reference

All credits to Sepp Hochreiter and Jürgen Schmidhuber

Stats.StackExchange

More Details in Journal here. If you are familiar with neural network. You can jump to Section 4 directly.

--

--