Neural networks backward propagation deep dive 103

Shaun Enslin
Jul 8 · 3 min read

Backward propagation is a tricky subject to explain. Lets give it a go here showing the code and output data as we go.

This is part 3 in my series on neural networks. You are welcome to start at part 1 or skip to part 5 if you just want the code.

So, to perform gradient descent or cost optimisation, we need to write a cost function which performs:

  1. Forward propagation
  2. Backward propagation
  3. Calculate cost & gradient

This article concentres on (2) backward propagation.

So, we have simplified our neural network in figure 1 to only show the details to calculate S3 and thereafter S2 in one unit (circle) in each layer.

Figure 1

Weights

From part 2, we understand how our weights (thetas) were initialised, so just to visualise the weights (ø) that figure 1 is referring see figure 2.

Figure 2

Reformatting Y

Before we continue, if you understand (part 1) our Y column which contains the labels used to categorise our customers. Then to continue with back propagation, we need to reformat Y into a matrix which corresponds to the number of labels. In our case we have 7 categories for our customers.

Figure 3, shows how Y is converted to a matrix yv and labels are now indicated as a binary in the appropriate column.

yv = [1:num_labels] == y;
Figure 3

Forward propagation

Calculating S3 in figure 1 is pretty easy and is a fairly simple cost function, comparing our hypothesis in A3 with our actual category in yv. You will see the results in figure 4.

s3 = a3 — yv;
Figure 4

With S3 done, we can move onto S2, this one is a little more tricky. We apply our theta to S3 and multiply it by the gradient from from this layer. Thats why we kept Z2 in forward prop.

Note that we remove the bias columns in back prop

s2 = (s3*Theta2).*sigmoidGradient([ones(size(z2, 1), 1) z2]); 
s2 = s2(:, 2:end);

The sigmoid function can be seen below which computes the gradient of the sigmoid function.

function g = sigmoidGradient(z)
g = sigmoid(z).*(1-sigmoid(z));
end

Figure 5 shows the output for S2.

Figure 5

Conclusion

That concludes this article on backwards propagation. if you followed on from my first article in this series, then we still need to perform calculate the costs and gradients. Click here to move onto the 4th article in this series on neural networks.

Next up, we calculate the costs and gradients of our forward/backward props.

If you need a great course on machine learning, then this is the course for you.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Shaun Enslin

Written by

Coding, technology and data are my passions. Oh, and some crypto trading with lots of cycling on the side. https://www.linkedin.com/in/shaun-enslin-4984bb14b/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com