3 mins to understand LSTM — Long Short Term Memory

2 min readMay 16, 2023

LSTM in 15 words

A recurrent neural network architecture in conjunction of with a gradient based algorithm (i.e. optimization).

Why I write this ?

With the success of ChatGPT, lots of researchers and prompt engineers discuss how to use it and get evolved in personal growth and job market revolution. The characteristic of NN is a black box (like algo trading in finance or Link Search method mentioned by Sergey and Page at 1998 (link) and now evolving dramatically under Moore’s law. I wrote this up to help myself as a revision for fundamental of neural networks

Main Component of LSTM

Assume 1 layer for NN, Recurrent neural network which get

Input to Node (Input Gate)
Process Node with Kernel, taking logistic regression as example which is a gradient based learning algorithm (Famous solution as Kalman Filter, Softmax and Sigmod etc)
Output is the sum up the loss and multiply reward in each nodes (Output Gate)
Output as a Feedback input for the loop
Time Lag implemented as forget gate where lag as vanishing gradient problem (Forget Gate)

Up to this point, the learning would be in single direction (from left to right). At Next Section, I would explore more how bi-directional LSTM work, which is the fundamental concept for building ChatGPT.

Reference

All credits to Sepp Hochreiter and Jürgen Schmidhuber

Stats.StackExchange

More Details in Journal here. If you are familiar with neural network. You can jump to Section 4 directly.

3 mins to understand LSTM — Long Short Term Memory

LSTM in 15 words

Why I write this ?

Main Component of LSTM

Reference

Written by Oxygencube