Back-Propagation Through a Standard RNN Unit — a Mathematical Insight
The maths behind Back propagation is often overlooked. People simply don’t care about the equations because it’s the framework who looks after it. But I feel that getting to know the maths does help in understanding the underlying circuitry and once you are able to derive those equations yourselves, you do feel a sense of achievement as well (at least, I felt that). So read through, if you would like to get into the skin of back-propagation.
Pre-requisites
I really hope that you are aware with the basics of Neural Network.
Assumptions
I assume that we are dealing with a single training example in the first — half of the article. The equations more or less stay the same even if we have multiple of them. I will also add a section for the case where we have multiple examples as well.
I also assume that the input and the output is of the same length.
Notation
Forward Propagation
This should be a pre-requisite as well. If you are reading about Back propagation, I assume that you are already aware of the forward pass. Nevertheless, I am dropping the forward-pass equations and the RNN unit block diagram here as well.
Back-Propagation (with a single example)
Let’s make use of computation graphs to walk-through the process.
Back — Propagation (over a batch)
Let’s use mmm to denote the number of examples in a batch, then the notation gets modified to:
Rest stay the same.
Back — Propagation (the entire process)
This was a long article, wasn’t it? I really wanted latex support in medium, or if it exists, I don’t know how to make use of it. Do let me know if you know how to make use of latex on medium. Comment if I went wrong somewhere with the equations or with the explanation or if you have some suggestions. If you have any questions, do let me know in the comments as well.