Artificial Neural Network — A Beginners Perspective
Introduction:
Are you someone who gets fascinated by the terms AI, ML, Deep learning etc.? Then you should have heard about the term neural networks (or artificial neural networks). Ask yourself, have you ever came across the word neurons before? Mostly yes, In your lower class Biology lessons you would have came across isn’t it?. So let’s proceed our discussions from there
Structure of ANN:
Just like the structure of the neuron, the structure of ANN(Artificial Neural Network) is designed. The neurons transmit signals to other neurons in terms of signals. Keeping the model of a human neuron and the way it transmits signals, ANN is designed. Now, let’s have a look at the structure of ANN.
To the left you can see an image of an ANN. The things which have to be noted is, the first layer is the input layer and the last layer is output layer. All the layers in between Input and Output layers are called Hidden layers. If the number of hidden layers are greater than or equal to three, then it can be called as deep network. The arrows shooting out from one layer to another layer have some weights attached to it
Activation Function
The activation function is a function based on which the output will fire. Look into the below image. This simple model is called perceptron, where you have two inputs and a single output. You can also find that each arrow is associated with a weight, these weights are called as synaptic weights (more commonly called as weights) and are chosen randomly in the beginning
The sum of product of inputs multiplied by corresponding weights is fed into the circle. In the above case, Input 1 is multiplied with weight1 and Input 2 is multiplied with weight2 and then both are added and then passed into the circle, which is nothing but the activation function. Mathematically,
Z =(Input1*Weight1)+(Input2*Weight2)
Let’s say that he result is stored in a variable called Z. Based on the Z value, the output of the activation function will be fired. Some of the commonly used activation function are:
- Sigmoid
- ReLU (Rectified Linear Unit)
- tanh
Working of a Activation function
Let’s consider the unit step function as an activation function for our discussion. Irrespective of your input, your output is going to be either 1 or 0. Wait!! “Irrespective of your input?” Really? What happens if both my inputs are 0?. This is where ‘bias’ comes into play for this case. Let’s add a bias of 1 in our case (so that we don’t land up in 0), then we can mathematically represent our model as follows
Z =(Input1*Weight1)+(Input2*Weight2)+bias
That was a small introduction to the activation function. Now let’s see what is a cost function
Cost function
Now we have got a output from our neural network based on our activation function. But how do you know it is right or wrong or how close we are from the right output?. To address this issue, cost function came in.
Let’s say T is the true value and P is the predicted value. Then, the error involved is (T-P). Our duty is to reduce the error involved, to achieve this we need to choose the proper cost function so that we can quickly reduce the error. One of the commonly used cost function is cross entropy function.
Now we know about Activation function and Cost function, which means, we are ready to predict a output and able to see how far it is from the actual or true value. So what is left out? The learning from the errors is missed out.
Gradient Descent by Back Propagation
As we have seen about the activation function and cost function, now we will see how the actual learning takes place inside the ANN. Gradient Descent is actually a optimization algorithm to find the minimum of a function (why not that function be our cost function?)
Remember? our cost function will help us know how far we are from the actual result. So, why not apply gradient descent to our cost function in order to reduce the error. In other words we apply the gradient descent to our cost function to arrive at proper synaptic weight values which was randomly chosen in the beginning
Let’s now see what is back propagation. Back propagation is used for calculating the error contribution of each neuron after a batch of data is processed. It depends on the chain rule to go back through the network and calculate the errors. Back propagation calculates the error at the output and distributes it back to the network layers