Neural networks backward propagation deep dive 103
Backward propagation is a tricky subject to explain. Lets give it a go here showing the code and output data as we go.
So, to perform gradient descent or cost optimisation, we need to write a cost function which performs:
This article concentres on (2) backward propagation.
So, we have simplified our neural network in figure 1 to only show the details to calculate S3 and thereafter S2 in one unit (circle) in each layer.
From part 2, we understand how our weights (thetas) were initialised, so just to visualise the weights (ø) that figure 1 is referring see figure 2.
Before we continue, if you understand (part 1) our Y column which contains the labels used to categorise our customers. Then to continue with back propagation, we need to reformat Y into a matrix which corresponds to the number of labels. In our case we have 7 categories for our customers.
Figure 3, shows how Y is converted to a matrix yv and labels are now indicated as a binary in the appropriate column.
yv = [1:num_labels] == y;
Calculating S3 in figure 1 is pretty easy and is a fairly simple cost function, comparing our hypothesis in A3 with our actual category in yv. You will see the results in figure 4.
s3 = a3 — yv;
With S3 done, we can move onto S2, this one is a little more tricky. We apply our theta to S3 and multiply it by the gradient from from this layer. Thats why we kept Z2 in forward prop.
Note that we remove the bias columns in back prop
s2 = (s3*Theta2).*sigmoidGradient([ones(size(z2, 1), 1) z2]);
s2 = s2(:, 2:end);
The sigmoid function can be seen below which computes the gradient of the sigmoid function.
function g = sigmoidGradient(z)
g = sigmoid(z).*(1-sigmoid(z));
Figure 5 shows the output for S2.
That concludes this article on backwards propagation. if you followed on from my first article in this series, then we still need to perform calculate the costs and gradients. Click here to move onto the 4th article in this series on neural networks.
Next up, we calculate the costs and gradients of our forward/backward props.
If you need a great course on machine learning, then this is the course for you.