LSTM And GRU In Depth

Fraidoon Omarzai
3 min readJul 26, 2024

--

Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used to handle longer and more complex stories without forgetting the important bits.

Long Short-Term Memory (LSTM)

  • A type of RNN designed to address the vanishing gradient problem, which is hard for simple RNN
  • LSTM is a special kind of RNN capable of learning long-term dependencies
  • The key to LSTMs is the cell state, which can carry information across many time steps, and three types of gates that regulate the flow of information into and out of the cell

Components of LSTM:

  1. Cell State (C<t>): The memory of the network.
  2. Forget Gate (f<t>): Decides what information to discard from the cell state.
  3. Input Gate (i<t>): Decides what new information to add to the cell state.
  4. Output Gate (o<t>): Decides what information to output based on the cell state.
  5. Hidden State (h<t>): The output of the LSTM unit.

LSTM Equations:

  • σ: denotes the sigmoid function
  • ∗: denotes element-wise multiplication
  • b: bias terms
  • W: weight matrices
  • X<t>: is the input at time t

Gated Recurrent Unit (GRU)

  • GRU is a variant of LSTM that aims to achieve similar performance with a simpler structure
  • GRUs combine the forget and input gates into a single update gate and merge the cell state and hidden state

Components of GRU:

  1. Reset Gate (r<t>): Determines how much past information to forget.
  2. Update Gate (z<t>): Controls the balance between the previous hidden state and the new candidate’s hidden state.
  3. Hidden State (h<t>): The output of the GRU unit.

GRU Equations:

  • σ: denotes the sigmoid function
  • ∗: denotes element-wise multiplication
  • b: bias terms
  • W: weight matrices
  • X<t>: is the input at time t

Key Differences Between LSTM and GRU:

1. Gates:

  • LSTM has three gates (forget, input, output)
  • GRU has two gates (update and reset)

2. Complexity:

  • LSTM is more complex due to having more gates and separate cell states
  • GRU is simpler and thus faster to train and easier to implement

3. Performance:

  • Both can perform similarly on various tasks, but specific tasks might favor one architecture over the other

Implementation Tips:

1. Choosing between LSTM and GRU:

  • If you need a simpler and faster model, start with GRU
  • For tasks requiring longer-term memory and more control over memory, use LSTM

2. Hyperparameters:

  • Number of layers and units per layer
  • Learning rate, batch size, dropout rates, etc

3. Regularization:

  • Use techniques like dropout to prevent overfitting

4. Frameworks:

  • Popular deep learning frameworks like TensorFlow and PyTorch provide built-in support for LSTM and GRU layers

--

--

Fraidoon Omarzai

AI Enthusiast | Pursuing MSc in AI at Aston University, Birmingham