Artificial Neural Network (ANN)

6 min readMay 5, 2024

A basic core concept from deep learning that is heavily used in the workings of core industry problems is the artificial neural network (ANN). A neural network is nothing but any given input being treated as a set of neurons to pass on information to the next connected layers made of neurons while doing certain functions so as to gauge a final outcome.

This concept is said to be inspired by the neurons and nervous system of the human anatomy. Just as with the help of neurons and nerves, our body passes signals to the brain, this architecture does the same at a fundamental level.

As mentioned, first let us try to understand a single-layer neural network. This implies that between the input and output, there will be only one layer that processes the input and passes it on to the output layer. Generally, the layers between the input and output are called hidden layers, as they only do certain processing and pass information to the next layer or neurons. To get an idea, let us look at the image given below:

Over here, the inputs are represented and passed to the network as data points in the form of x1,x2,x3……xn. The line further passes each incoming input neuron to the next layer which is the hidden layer.

Weights and Bias

Now let us understand what exactly happens between the input and hidden layer and how information is passed through them by looking at the visualisation below:

So firstly, as we pass any neuron the next hidden layer, we initialise weights alongside it and pass them to the layer as well. These weights could be random or neutral values for that matter, as their main focus is to to signal which input is to be processed and in how much magnitude.

Again, the weights are initialised in the form of w1,w2,w3,….wn. To put simply, these weights just inform which neurons are to be activated and pass information to the next neuron/layer in the network.

Now, there may be a case such that the dot product of both the above be zero, such as the case where the weights are assigned as zero. So, to avoid to that situation where the input is not passed to the next layer, we add a certain Bias(b) into each hidden layer, where the information is passed, which can be interpreted mathematically as an intercept.

Now in the hidden layer, the submission of the weights and the input data points is performed and added alongside the bias. The equation for that is :

This can also be viewed as the equation of a line such as y = mx + c or even the dot product of input points and weights vector such as wT.x + b, where wT implies transpose of w.

Activation Function

Next, we come to the activation function. This activation function basically decides whether or not a certain neuron, which has received information from the previous layer of neurons, should be activated and pass information ahead.

Basically, it makes the decision to act on the previous data and decides the magnitude in which an action or decision by the future layers should be taken.

For example, let us take the equation of the activation function as follows :

This activation function is known as the sigmoid activation function. Usually, it is used for binary classification tasks with the following logic :

The given threshold is a user defined value to segregate the two classes. Generally it is equal to 0.5. Once the activation function gives the value, it passes it on further to the next layer as shown above. Of course the next layer has it’s own weights initialised and acts on the same logic to give the final output. Note that the threshold value is dependent on the type of data and problem we are trying to solve, and can vary accordingly.

Also to add to the above, sigmoid function is used to introduce non-linearity in the training process. If there is no non-linear function in between the inputs and outputs, the the network then only represents a linear transformation of the initial data point, which may not be able to predict the correct output.

Sigmoid’s non-linearity allows the network to capture complex relationships between the input features and the output, making the network more dynamic and accurate.

Forward Propagation and Loss Function

Now, in the above neural network, we have moved from left to right and have added the input to the hidden layer and passed it on further to the output layer, this process is called as Forward Propagation. This is so as we are moving the inputs in a sequential order to attain the final output.

After this, there is another interesting aspect of this network which is called as the Loss function. At the end of the output layer, we get a predicted value ŷ and then we use this function to compare the predicted value with the actual value y, a simple loss function can be as follows:

Our main aim is to minimise this loss function and make sure it is as close to 0 as much as possible, as the values of loss range between 0 to infinity.

Backward Propagation

Now, to minimise the same loss, we then use the concept of Backward Propagation. In this, we primarily aim to update the weights which we initialised previously. Using backward propagation, the weights are updated and assigned different values, so as to predict the correct output which in turn, minimises the loss. The process of back propagation uses the optimisers so as to ensure that each weight which has been assigned to a connection between neurons is updated and the process is continuous. A few optimisers to name are as follows :

Gradient Descent
Adam Optimiser
RMSprop

So to conclude, an ANN consists of interconnected neurons which are organised into layers. Firstly, inputs are processed through hidden layers using weighted sums and activation functions, and the data is moved from left to right flow to produce an output, with this process being known as forward propagation. Post that, an output is produced and evaluated against an actual value using a loss function, following which we use optimisers to back propagate and update the weights for the minimisation of loss.

ANNs are the core for high level architectures which will be expanded on ahead. This concept allows to solve many real world problems such as image recognition, machine translation, chatbots, object detection etc. A high level view of a single layer ANN can be as shown below :

Now with the similar conceptual notion as explained above, we can understand a multi layer neural network and visualise it as follows:

You can further read about ANNs in the research paper given here, which acts as an in-depth introduction to neural networks.

Credits

I would like to take the opportunity to thank Krish Naik for his series in deep learning on his Youtube channel, which has allowed me to learn and present the above article. You can check out his Youtube channel here. Thanks for reading!