Forward propagation is an important part of neural networks. Its not as hard as it sounds ;-)
So, to perform gradient descent or cost optimisation, we need to write a cost function which performs:
In this article, we are dealing with (1) forward propagation.
In figure 1, we can see our network diagram with much of the details removed. We will focus on one unit in level 2 and one unit in level 3. This understanding can then be copied to all units. (ps. one unit is one of the circles below)
Our goal in forward prop is to calculate A1, Z2, A2, Z3 & A3
Just so we can visualise the X features, see figure 2 and for some more info on the data, see part 1.
Initial weights (thetas)
As it turns out, this is quite an important topic for gradient descent. If you have not dealt with gradient descent, then check this article first. We can see above that we need 2 sets of weights. (signified by ø). We often still calls these weights theta and they mean the same thing.
We need one set of thetas for level 2 and a 2nd set for level 3. Each theta is a matrix and is size(L) * size(L-1). Thus for above:
- Theta1 = 6x4 matrix
- Theta2 = 7x7 matrix
We have to now guess at which initial thetas should be our starting point. Here, epsilon comes to the rescue and below is the matlab code to easily generate some random small numbers for our initial weights.
function weights = initializeWeights(inSize, outSize)
epsilon = 0.12;
weights = rand(outSize, 1 + inSize) * 2 * epsilon - epsilon;
After running above function with our sizes for each theta as mentioned above, we will get some good small random initial values as in figure 3
. For figure 1 above, the weights we mention would refer to rows 1 in below matrix’s.
Now, that we have our initial weights, we can go ahead and run gradient descent. However, this needs a cost function to help calculate the cost and gradients as it goes along. Before we can calculate the costs, we need to perform forward propagation to calculate our A1, Z2, A2, Z3 and A3 as per figure 1.
As per figure 1, lets calculate A1. You can see that its pretty much my X features an we add the bias column hard coded to “1” in front. Here is the matlab code to do this:
a1 = [ones(m, 1) X];
The result will now give you the results in A1 in figure 4. Take special note of the bias column “1” added on the front.
Great, thats A1 done, lets move onto A2. Before we get A2, we will first run a hypothesis to calculate Z2. Once you have the hypotheses, you can run it through the sigmoid function to get A2. Again, as per figure 1, add the bias column to the front.
z2 = a1*Theta1';
a2 = [ones(size(z2, 1), 1) sigmoid(z2)];
You can see the sigmoid function below:
function g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
Your results will be as per figure 5, taking note of the bias in column 1.
Ok, so we almost there…. Now onto A3, lets do the same as with A2, but this time, we dont worry to add the bias column.
z3 = a2*Theta2';
a3 = sigmoid(z3);
You may be asking, “why do we keep Z2 & Z3”. Well, we will need those in back propagation. So we may as well keep them handy ;-).
So, we have learnt the first 2 steps in neural networks, which is to
- Initialise our weights (thetas)
- Perform forward propagation
You can move onto my next article in the series which is to perform back propagation.
BTW, if you are looking for a great course on machine learning, I can highly recommend this course.