What is LSTM in AI? Explained in Everyday Language for AI Beginners

Published in

AI for Absolute Beginners

7 min readMay 22, 2024

“Which is better, transformer or LSTM?”

This is a popular question in the new era of LLM/AI. When I first entered the industry, I would often Google ‘LSTM’ every time I came across it to ensure I understood it correctly and the letters were in the right order…While I will introduce transformer in a future blog soon, let’s talk about LSTM first.

Then even first thing first, LSTM stands for Long Short-Term Memory.

Keep in mind, it is L-S-T-M, not LTSM or LMTS. ;)

What is LSTM?

LSTM is a type of neural network used in AI, particularly useful for tasks that involve sequences of data, like speech or text.

Unlike transformer which was born in 2017, LSTMs are not new; they were first introduced way back in 1997 by Sepp Hochreiter and Jürgen Schmidhuber. LSTMs were designed to solve the vanishing gradient problem commonly encountered with traditional recurrent neural networks (RNNs). By incorporating a complex system of gates (forget, input, and output gates), LSTMs can maintain information over long periods, making them effective for tasks involving sequences such as time-series prediction, speech recognition, and language modeling.

What is LSTM in AI? Explained in Everyday Language for AI Beginners

What is LSTM?

Written by A. Zhang