Going Deep From Scratch.

4 min readAug 9, 2019

In my previous post which can be found here, I explained how to implement a two-layer NN from scratch in Java and numpy. I started with mathematical derivations to code. Be sure to check it out.

In this post, I will extend the concept further. We are going to derive and implement a deep NN from scratch in numpy. Utilizing the previous concepts you can archive the same in Java or any other programming language of your choice.

Note: This article assumes that you already have some basics of neural network and how backprop and vectorization works. If not start here.

Let's get into it...

Motivation

Let's begin with a three-layer Neural Network as shown below:

Vectorized Forward Propagation Equations will be as follows:

Where g(Z) is the activation function of your choice.

The computation graph for the back prop will look like this.

Our aim is to compute the gradient of loss (J) w.r.t to learnable parameters w’s and b’s.

dJ/dW3 and dJ/db3
dJ/dW2 and dJ/db2
dJ/dW1 and dJ/db1

I will be using the following notations (For easy mapping of eqns. and code):

dJ/dW3 = dW3 and dJ/db3=db3
dJ/dW2 = dW2 and dJ/db2=db2
dJ/dW1 = dW1 and dJ/db1=db1
Generally: dJ/dx = dx

Using the chain rule of differentiation, Compute the gradient of loss (J) w.r.t to learnable parameters (W’s and b’s).

dJ/dW3 and dJ/db3

Note: g(Z) is an activation function of your choice

2. dJ/dW2 and dJ/db2

3. dJ/dW1 and dJ/db1 (Try your self)

Using the same principles, backprop for the first layer becomes.

Generally:

Following the above equations, we can generalize the backpropagation algorithm for a deep neural network as follows.

Gradient Flow

Now the fun part:

CODE

In case you will be stuck in any section of the code please be sure to review again these two tables [Reference Table 1 and Reference Table 2.]

Forward Propagation.

It's simple as shown below. Loop through L layers and for each Layer compute ZL and AL. AL is then passed as the input to the next layers and so on.

We are storing AL and ZL for each layer in a dictionary cache for use during the backward pass.

And here is the foward_prop function.

BackPropagation.

I have computed the back prop for the last layer separately just to make it easier for me to loop through the subsequent layers. I’m using grads dictionary to store all the gradients in each layer, These will be used to update learnable parameters using gradient descent.

And here is the sigmoid backward function

Gradient Descent

Loop through each layer, and perform parameters update.

lr is the learning rate.

params is the dictionary holding the W’s and b’s.

FULL WORKING CODE

Conclusion

This is not an efficient implementation of a neural network, but my intention was to convey an intuitive understanding of machine learning concepts and have the ability to communicate them into code.

Did you find this article helpful? Did you spot any mistakes? (Possibly, because English is not my first language). Have opinion/comments? Drop them below.

or reach me out at => deusjeraldy{at}gmail.com

References

[1] https://www.coursera.org/specializations/deep-learning

Going Deep From Scratch.

Generally:

Written by Jeraldy Deus