Why LSTM cannot prevent gradient exploding?
There are many opinions online related to gradient-vanishing/gradient-exploding in LSTM. Some say that LSTM can prevent both from happening, some say LSTM cannot. I choose to believe that LSTM can prevent gradient vanishing but not gradient exploding. And this post is to explain the reason.
When I study back propagation of LSTM, there’s one resource that is easy to understand to me. Most math formulae of this post came from that article.
https://towardsdatascience.com/only-numpy-deriving-forward-feed-and-back-propagation-in-long-short-term-memory-lstm-part-1-4ee82c14a652
And I’d like to thank everyone who giving me the clue.
This is my previous post about LSTM.