Neural Networks In a Nutshell.
I will try my level best to present a transparent view of how neural network works such that by the end of this article you might gain a solid knowledge of this topic.
Reason To Focus On Neural Networks
Since machine learning algorithms are now recursively used to predict various cases like cancer , stocks etc. the number of neural networks projects are growing at an exponential rate. We can also say that neural networks sits at the core of revolutionary machine learning projects.
Lets Learn About Them
Artificial Neural Networks Are Inspired By Biological Neural Networks.
Just like a biological neural network, an artificial neural network is constantly learning and updating its knowledge and understanding of the environment based on experiences that it encountered.
An artificial neural network is simply a set of mathematical algorithms that work together to perform operations on the input. These operations then produce an output.
Therefore, these mathematically inter-connected formulae are known as an artificial neural network (ANN).
Neural networks can help us understand the relationships between complex data structures. The neural networks can use the trained knowledge to make predictions on the behavior of the complex structures. They can process images and even make complex decisions such as on how to drive a car, or which financial trade to execute next.
Although neural networks can be sophisticated and can solve complex problems, they are slower than most machine algorithms. They can also end up overfitting the training data.
How Does it Work?
Let’s review this neural network
A neural network can contain multiple layers. The artificial neural network shown above has 5 layers:
- One Input layer
- One Output layer
- Three Hidden Layers
Ignoring the Nth input there are in total of 12 neurons:
- 2 input neurons
- 9 hidden neurons — 3 neurons within each hidden layer
- 2 output neurons
This is a feed-forward neural network as the data is flowing in one direction only, from the input layer to the output layer.
- Each neuron is connected with another neuron via synapses.
- Each neuron takes in input from one-or-more neurons along with the weights and a bias which I will explain in detail later on.
What is a Neuron ?
A neuron is a container that contains a mathematical function which is known as an activation function, inputs (x1 and x2) , a vector of weights(w1,w2) and a bias(b).
A neuron first computes the weighted sum of the inputs.
Think of the activation function as a mathematical operation that normalizes the input and produces an output. The output is then passed forward onto the neurons on the subsequent layer.
Understanding A Neural Network Layer
There will always be an input and output layer. We can have zero or more hidden layers in a neural network. The learning process of a neural network is performed with the layers. The key to note is that the neurons are placed within layers and each layer has its purpose. The neurons, within each of the layer of a neural network, perform the same function. They simply calculate the weighted sum of inputs and weights, add the bias and execute an activation function.
Lets Understand Each Layer Separately
Input Layer
The input layer is responsible for receiving the inputs. There must always be one input layer in a neural network.
The input layer takes in the inputs, performs the calculations via its neurons and then the output is transmitted onto the subsequent layers. Input layer takes in the inputs. Output layer produces the final results.
Output Layer
The output layer is responsible for producing the final result. There must always be one output layer in a neural network.
The output layer takes in the inputs which are passed in from the layers before it, performs the calculations via its neurons and then the output is computed.
In a complex neural network with multiple hidden layers, the output layer receives inputs from the previous hidden layer.
Hidden Layer
The introduction of hidden layers makes neural networks superior to most of the machine learning algorithms. Hidden layers reside in-between input and output layers and this is the primary reason why they are referred to as hidden. The word “hidden” implies that they are not visible to the external systems and are “private” to the neural network.
There could be zero or more hidden layers in a neural network.
The larger the number of hidden layers in a neural network, the longer it will take for the neural network to produce the output and the more complex problems the neural network can solve.
The neurons simply calculate the weighted sum of inputs and weights, add the bias and execute an activation function.
Activation Function
Activation function is nothing but a mathematical function that takes in an input and produces an output. The function is activated when the computed result reaches the specified threshold. The input in this instance is the weighted sum plus bias:
And the thresholds are pre-defined in the function. This very nature of the activation functions can add non-linearity to the output. Subsequently, this very feature of activation function makes neural network solve non-linear problems. Non-linear problems are those where there is no direct linear relationship between the input and output.
To handle these complex scenarios, a number of activation functions are introduced which can be configured on the inputs.
Let’s review a number of common activation functions. Before we dive into each of the activation function, have a look at this table. I am demonstrating how the values differ for the five most well-known activation functions which I will be explaining in detail.
Each activation function has its own formula which is used to convert the input.
Linear Activation Function:
The activation function simply scales an input by a factor, implying that there is a linear relationship between the inputs and the output.
This is the mathematical formula:
Output = Y*X
Sigmoid Activation Function:
The sigmoid activation function is “S” shaped. It can add non-linearity to the output and returns a binary value of 0 or 1.
Tanh Activation Function:
Tanh is an extension of the sigmoid activation function. Hence Tanh can be used to add non-linearity to the output. The output is within the range of -1 to 1. Tanh function shifts the result of the sigmoid activation function:
Rectified Linear Unit Activation Function (RELU)
RELU is one of the most used activation functions. It is preferred to use RELU in the hidden layer. The concept is very straight forward. It also adds non-linearity to the output. However, the result can range from 0 to infinity.
Softmax Activation Function:
Softmax is an extension of the Sigmoid activation function. Softmax function adds non-linearity to the output, however, it is mainly used for classification examples where multiple classes of results can be computed.
The activation functions normalize the input and produce a range of values from 0 to 1.
The weights along with the bias can change the way neural networks operate.
What is Bias?
Bias is simply a constant value (or a constant vector) that is added to the product of inputs and weights. Bias is utilized to offset the result.
The bias is used to shift the result of activation function towards the positive or negative side.
What Are Weights?
The weights are possibly the most important concept of a neural network. When the inputs are transmitted between neurons, the weights are applied to the inputs and passed into an activation function along with the bias. The weights are essentially reflecting how important input is.
When a neural network is trained on the training set, it is initialized with a set of weights. These weights are then optimized during the training period and the optimum weights are produced.
Weights are the co-efficient of the equation which you are trying to resolve. Negative weights reduce the value of an output.
Learning Rate:
The learning rate determines the speed at which we want to update the weights. The lower the learning rate, the longer it will take for the optimization algorithm to reach the local minimum point and converge. On the other hand, if the learning rate is large then it might never converge and reach the local minimum point. Hence, the right balance is required.
The learning rate is used in the optimization algorithm to update the weights.
Epoch:
Epoch is one of the input parameters of the learning algorithm. Think of epoch has a loop. It determines the number of times a learning algorithm is going to update the weights.
If the value of epoch is 1 then it means each data set in the training set will be fed into the neural network to update the weights.
If the epoch is 5 then it means there will be 5 loops.
Loss And Accuracy:
The loss function is also known as the cost function. They compute the error value.
To be precise, the cost function is the average of loss functions. This is the function that the optimization algorithm is trying to minimize. There are a large number of loss functions available, such as mean squared error, binary cross-entropy etc.
The loss function essentially tells the neural network what action it needs to perform to improve the accuracy. This information is taken by the optimizer to produce accurate weights.
The neural network can then forward propagate the input data.
Forward Propagation:
The forward propagation process is also known as inference. It is the most simple neural network form which takes in the inputs, processes them and passes them to the subsequent layers; all the way to the neurons of the output layer.
Each neuron applies the weights to the inputs along with the bias and computes the appropriate activation function.
Back Propagation:
Backpropagation takes in the difference between the predicted and actual values to further enhance the weights.
Firstly, the partial derivative of the error value with respect to each weight is calculated. The derivative, referred to as the gradient of the slope, is calculated from the last layer. The derivative is then used to calculate the gradients of the previous layer and then the process is repeated. The process is repeated for every weight in every layer.
The value of the weight value is subtracted from the error value to ensure the accuracy is improved. Note: error value is the difference between predicted and actual.
This process, as we move backward from last to the first layer, is known as backpropagation. It can also apply dropouts to the weights.
Dropout:
Dropout is used to set the weights to zero. This process randomly sets the weights to 0 and thus enhances the prediction of the network.
Hope this article helped .