DLOA (Part-24)-Long Short-Term Memory

Dewansh Singh
6 min readMay 27, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed Gated Recurrent Unit (GRU) and its implementation. If you didn’t read that you can go through this link. In this blog, we’ll be discussing Long Short-Term Memory (LSTM), its working, and its basic implementation.

LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture that is designed to address the vanishing gradient problem and capture long-term dependencies in sequential data. It was introduced by Hochreiter and Schmidhuber in 1997 and has since become widely used in various applications such as natural language processing, speech recognition, time series analysis, and more.

Introduction to LSTM:

LSTM networks are an extension of standard RNNs that introduce memory cells and gating mechanisms to enable the model to retain and update information over long sequences. The primary motivation behind LSTM is to overcome the limitations of traditional RNNs, which struggle with capturing dependencies that are several steps apart in a sequence.

Architecture of LSTM:

The key components of an LSTM unit are the memory cell, input gate, forget gate, and output gate. The memory cell serves as a long-term storage unit that can store information over a sequence, while the gates control the flow of information into and out of the memory cell.

LSTM Diagram
  • Memory Cell: The memory cell is responsible for retaining information over time. It has a self-connected recurrent connection that allows it to maintain its value over long sequences. The cell state represents the memory of the LSTM and is passed along from one time step to the next.
  • Input Gate: The input gate determines which information from the current time step should be stored in the memory cell. It takes input from the current time step and the previous hidden state and applies a sigmoid activation function to produce values between 0 and 1. These values control how much of the input should be stored in the memory cell.
LSTM-Input Gate
  • Forget Gate: The forget gate decides which information from the previous memory cell state should be discarded. It takes input from the current time step and the previous hidden state and applies a sigmoid activation function. The output is multiplied element-wise with the previous memory cell state, allowing the LSTM to selectively forget irrelevant information.
LSTM-Forget Gate
  • Output Gate: The output gate determines which information from the memory cell should be exposed to the next hidden state and the output of the LSTM. It takes input from the current time step and the previous hidden state and applies a sigmoid activation function. The output is then passed through a tanh activation function to introduce non-linearity. The output gate allows the LSTM to control the information flow from the memory cell to the output.
LSTM-Output Gate

Working of LSTM:

The working of LSTM involves a series of steps that occur at each time step of the input sequence.

LSTM-Working Diagram
  1. Input Processing: At each time step, the LSTM receives input and the previous hidden state. The input is multiplied by a weight matrix, and the previous hidden state is multiplied by another weight matrix. These two values are combined and passed through the activation function to compute the input gate, forget gate, and output gate.
  2. Memory Cell Update: The input gate determines which values from the current time step should be stored in the memory cell. The forget gate determines which values from the previous memory cell state should be discarded. The input gate and forget gate are multiplied element-wise with the previous memory cell state, and their outputs are combined to update the memory cell state.
  3. Hidden State Update: The output gate determines which values from the memory cell state should be exposed to the next hidden state and the output of the LSTM. The memory cell state is passed through the tanh activation function, and the output gate is multiplied element-wise with the output of the tanh activation. The resulting values form the hidden state, which is passed to the next time step and used for predictions or fed back into the LSTM for subsequent steps.

The above steps are repeated for each time step of the input sequence, allowing the LSTM to capture dependencies and retain information over long sequences.

Implementation

Here’s an example of implementing an LSTM model using the TensorFlow framework:

import tensorflow as tf
from tensorflow.keras import layers

# Define the LSTM model
def lstm_model(input_shape, num_classes):
model = tf.keras.Sequential()

# LSTM layer
model.add(layers.LSTM(64, input_shape=input_shape))

# Dense layer
model.add(layers.Dense(64, activation='relu'))

# Output layer
model.add(layers.Dense(num_classes, activation='softmax'))

return model

# Set the input shape and number of classes
input_shape = (timesteps, features) # Shape of the input sequence
num_classes = 10 # Number of output classes

# Build the LSTM model
model = lstm_model(input_shape, num_classes)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

In the above code:

  • We define a function lstm_model that takes input_shape (the shape of the input sequence) and num_classes (the number of output classes) as arguments.
  • Inside the function, we create an instance of the Sequential class from tf.keras to build our model.
  • We add an LSTM layer to the model using the add method. The layer has 64 units, and we specify the input_shape as the shape of our input sequence.
  • We add a fully connected layer (Dense layer) to introduce non-linearity into the model.
  • Finally, we add an output layer with num_classes units and the softmax activation function for multi-class classification.
  • We compile the model using the Adam optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric.
  • The model is trained on the training data (X_train and y_train) and validated on the validation data (X_val and y_val).
  • After training, we evaluate the model on the test data (X_test and y_test) and print the test loss and accuracy.

Remember to replace X_train, y_train, X_val, y_val, X_test, and y_test with your actual data when implementing the code. Also, adjust the input shape and the number of classes based on your specific problem.

Conclusion

In conclusion, LSTM is a powerful variant of RNN that addresses the challenges of capturing long-term dependencies in sequential data. By introducing memory cells and gating mechanisms, LSTM models can effectively store and update information over time, making them suitable for various tasks that involve sequential data processing.

That’s it for now….I hope you liked my blog and got to know about Long Short-Term Memory (LSTM), its working, and its implementation.

In the next blog, I will be discussing the different types of Bidirectional RNN (BiRNN) in detail one by one.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies