LSTM Unboxed: A Layman’s Journey into Intelligent Sequence Prediction

7 min readDec 10, 2023

Image from: https://neurosciencenews.com/files/2023/07/color-visual-stimuli-neurosinces.jpg — https://neurosciencenews.com/files/2023/07/color-visual-stimuli-neurosinces.jpg

Another useful Machine learning algorithm, that people find it difficult to understand. In the vast realm of machine learning, there exists an intriguing algorithm that goes by the name Long Short-Term Memory (LSTM). While the name might sound like a secret code from a spy movie, fear not! We’re here to demystify LSTM in a way that even your grandma could understand. So, buckle up as we embark on a journey through the fascinating world of LSTMs.

Let’s take predicting the weather as our real-life example. Meteorologists use historical weather data to forecast future conditions. In the realm of LSTMs, the algorithm plays the role of a weather guru who not only considers past data but also factors in the long-term patterns (like seasons) and short-term fluctuations (like sudden rain showers). Imagine you’re planning a weekend getaway. You’d want accurate weather predictions, right? LSTMs, with their ability to capture both short and long-term patterns, help meteorologists provide you with more reliable forecasts.

Breaking the Jargon

Long-Term Memory: Imagine you’re teaching your pet parrot to speak. The parrot learns phrases over time and retains them for a long duration. That’s the long-term memory in action.

Short-Term Memory: Now, consider the short-term memory as your goldfish. It can remember things for a very brief period — just a few seconds. Forgetful, yet adorable!

What is LSTM?

Stands for Long Short-Term Memory
It’s not exactly an algorithm on its own but rather a type of recurrent neural network (RNN) architecture — a special kind of algorithm used in machine learning.
LSTMs have a mechanism that allows them to selectively remember or forget information.
This is useful for tasks that involve recognizing patterns over different time scales.

Real life example: Smart Autocorrect on Your Phone

Imagine you’re texting your friend on your smartphone. You start typing a sentence, and suddenly, you make a typo. Now, your phone’s autocorrect feature kicks in to fix it. Have you ever noticed how autocorrect not only corrects the mistake you just made but also seems to understand the context of what you’re typing?

Now, here LSTM comes into play:

Learning from the Past: You’ve been texting for a while, and your phone has observed your typing habits. It’s like having a friend who knows your writing style and the kind of words you use.
Remembering Context: Let’s say you type, “I’ll meet you at the resturant.” Oops! You misspelled “restaurant.” Now, a basic autocorrect might just fix that one word. But an LSTM-powered autocorrect is smarter.
Considering Long-Term Context: LSTM doesn’t just fix the typo; it remembers the context. It understands that in your previous messages, you always use the correct spelling for “restaurant.” So, it predicts that you probably meant “restaurant” and not “resturant” based on your long-term writing style.
Predicting the Next Word: Beyond just fixing typos, LSTM can even predict the next word you’re likely to type. It considers the words you’ve used before, not just in the current sentence but throughout your conversation.

Working of LSTM

Let us suppose that we have a stock dataset, that contains details about stocks.

The first step is to gather historical stock data. This includes information such as opening prices, closing prices, high and low prices, and trading volumes over a specific period.
LSTM is particularly effective in handling time series data, which is a sequence of data points ordered by time. In the case of stock data, each day’s stock information becomes a data point in the sequence.
Before feeding the data into the LSTM model, it needs to be preprocessed. This involves normalizing the data to a scale that the algorithm can work with and dividing it into training and testing sets.
The LSTM model consists of layers that allow it to learn patterns and relationships within the time series data. It has input gates, forget gates, and output gates that control the flow of information and help the model remember or forget certain aspects.
During training, the LSTM model learns from the historical data. It adjusts its parameters to capture patterns, trends, and dependencies in the stock prices. The model’s ability to remember important information for a long time and discard less relevant details is crucial for predicting stock movements.
Once trained, the LSTM model can be used to make predictions on new, unseen data. For stock forecasting, it takes in the historical stock prices and uses the learned patterns to predict future stock prices or trends.
The accuracy of the model’s predictions is evaluated using the testing set. Fine-tuning may be necessary to improve performance, adjusting parameters or considering additional factors that could influence stock prices.

Architecture of LSTM

https://miro.medium.com/v2/resize:fit:1156/1*laH0_xXEkFE0lKJu54gkFQ.png

Involves several components that work together to capture and learn patterns in sequential data.

Cell State

Imagine a conveyor belt that runs along the entire length of the LSTM. This conveyor belt is the cell state, which acts as a long-term memory. It carries information from the beginning to the end of the sequence, allowing the LSTM to remember things for a more extended period.

Three Gates

Now, picture three gates like traffic lights on the conveyor belt: an input gate, a forget gate, and an output gate. These gates control the flow of information within the LSTM.
Input Gate: Decides what information to let into the cell state, like deciding which parts of a story are essential.
Forget Gate: Chooses what to remove or forget from the cell state, helping the LSTM discard less relevant information.
Output Gate: Determines what information from the cell state to use as the output, like deciding which parts of the story to share.

Hidden State

Alongside the cell state, there’s the hidden state, which is like a short-term memory or a quick note about the current state of things. It’s influenced by both the current input and the previous hidden state.

Mathematical Operations

Within each gate, there are mathematical operations that involve weights and biases. These operations adjust the information passing through the gates, allowing the LSTM to learn and adapt to different patterns in the data.

https://av-eks-blogoptimized.s3.amazonaws.com/Screenshot-from-2021-03-16-13-41-03.png

Implementation of Code

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Generate a sample weather dataset (temperature)
np.random.seed(42)
dates = pd.date_range(start="2022-01-01", end="2022-12-31", freq="D")
temperature = np.random.randint(60, 100, size=(len(dates),))

# Create a DataFrame
df = pd.DataFrame({"Date": dates, "Temperature": temperature})

# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
df["Temperature"] = scaler.fit_transform(df["Temperature"].values.reshape(-1, 1))

# Prepare data for LSTM
def create_sequences(data, seq_length):
    sequences, targets = [], []
    for i in range(len(data) - seq_length):
        seq = data[i : i + seq_length]
        target = data[i + seq_length]
        sequences.append(seq)
        targets.append(target)
    return np.array(sequences), np.array(targets)

seq_length = 5
X, y = create_sequences(df["Temperature"].values, seq_length)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the LSTM model
model = Sequential()
model.add(LSTM(units=50, activation="relu", input_shape=(seq_length, 1)))
model.add(Dense(units=1))
model.compile(optimizer="adam", loss="mean_squared_error")

# Reshape the data for LSTM input
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), verbose=2)

# Make predictions
y_pred = model.predict(X_test)

# Inverse transform the predictions and actual values
y_pred_actual = scaler.inverse_transform(y_pred)
y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))

# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(y_test_actual, label="Actual Temperature")
plt.plot(y_pred_actual, label="Predicted Temperature")
plt.title("LSTM Weather Prediction")
plt.xlabel("Time")
plt.ylabel("Temperature")
plt.legend()
plt.show()

Output:

Applications of LSTM

Smart Typing (Autocorrect): LSTMs help your phone understand what you’re typing and fix mistakes. It’s like having a friend who knows exactly what you meant to say, even if you hit the wrong keys.
Speech Recognition: When you talk to your virtual assistant, LSTMs help it understand your words. They remember the context and predict what you might say next, making your assistant smarter.
Language Translation: Ever used an online translator? LSTMs power these tools by understanding the patterns in different languages, helping you communicate across borders.

Conclusion

And there you have it! Long Short-Term Memory, or as we affectionately call it, the Sherlock Holmes of the machine learning world — always remembering the important details while gracefully forgetting the algorithmic equivalent of where it left its keys. It’s the brainiac that predicts weather like a seasoned meteorologist and fixes your typos with the finesse of a grammar superhero. So, the next time someone mentions LSTMs, remember, it’s not just a fancy term; it’s the unsung hero making your tech life smoother. Now, go forth and impress your friends with your newfound knowledge of the clever algorithm that’s basically the memory maestro of the digital universe!

👉 Follow me on Medium | LinkedIn | Github | I’m excited to connect and let me know if you want me to write a blog on any topic related to Data Science!!