Neural Network 06 — Deep L-layer Neural Network
So far we have been discussing basics and very important theoretical concepts behind neural network using a simple 2-layer architecture. Now onwards we will dive into Deep Learning world starting from this lesson. Let’s get started.
What is a Deep Neural Network?
The “deep” in a neural network refers to the presence of multiple hidden layers in between input layer and output layer.
- Technically Logistic Regression is a 1-layer neural network.
- very deep networks can learn complex patterns that shallower model are often unable to.
Can we predict in advance exactly how deep in our neural network would be? 🤔
Nope! It’s very hard to predict.
It’s basically a trial and error process at the beginning. We can start from shallower network like Logistic Regression and keep adding layers until we satisfy our results.
Alternatively, we can use “number of layers” as a hyperparameter that we can try variety of values.
Deep Neural Network notations
Let’s consider a 4-layer neural network.
Forward Propagation of a deep Neural Network
Let’s consider following network with 4 layers. (I haven’t shown every
NB: connection between neurons in these diagrams. You can consider them as fully connected (dense) layers)
As we already know, we do not use explicit for-loops in deep learning. So, vectorized implementation is crucial.
Vectorized version of forward prop.
We should use a for-loop to iterate through every layer of the network.
We already know that, we are dealing with matrices everywhere in deep learning. So, understanding the dimensions of matrices is very important.
Getting your matrix dimensions right
The dimensions of parameters w[l] and b[l] for one training example.
☝I recommend referring above diagram several times until you understand those relationships really well.
Vectorized implementation
In vectorization we consider full dataset (m). The general rule for W[l], b[l] still applies for vectorization as well. But with m number of training examples instead of 1 example.
Why deep learning works well?
Let’s learn intuition about deep learning.
Deep Learning process flows from simple -> complex hierarchy. Let’s see what that means using an image processing example.
- At the beginning of the network, starting layers detects simple patterns like edges of the image. This is basically color/shades differences of the image. (We will learn more about edge detection sooner or later)
- At the middle layers, network learns to detects more complex patterns like different features of the image (eg: eyes, nose, lips, ears, …)
- And then at the deeper levels of the network, it detects even more complex patters and basically it can be entire faces in this particular example.
Another example is an audio processing task.
So, deep neural network with multiple hidden layers might be able to have the earlier layers learn these “low level” simple features and then have the later, deeper layers put together the simpler things it’s detected in order to detect more complex things.
Mathematical functions are much easier to compute on deep network as well.
Building blocks of a deep neural network
In neural network and deep learning, the learning process consists of three main parts.
1. Forward propagation
2. Back propagation
3. Gradient descent
Let’s consider one layer of a deep neural network and see how these functions are computing different results.
We can represent above calculations in that layer as follows.
For whole neural network:
Forward and Back Propagation implementation
Summary of forward and back propagation:
Parameters and Hyperparameters
- Hyperparameters controls parameters w and b.
- Hyperparameters determines the final values of parameters w and b.
Parameters: w[1], b[1], w[2], b[2], w[3], b[3], …
Hyperparameters:
Learning rate α
number of iterations
number of hidden layers L
number of hidden nodes n[1], n[2], …
choice of activation function
This is the successful completion of another lesson 😎. Hope you enjoyed it. 🙌 See you in the next lesson. Good Luck!!! Keep Learning!!! 👍