Understanding Neural Networks

7 min readMar 23, 2020

This article aims to simplify and amplify your understanding of neural networks. The article holds good for the audience that ranges between ‘I have no clue about neural networks’ to ‘I have implemented the backpropagation algorithm’. In this article, we will go in-depth and ensure understandability. Let us begin.

What are neural networks?

Neural networks have been the go-to solution to solve many industrial problems. There are several explanations provided for the effectiveness of the neural networks. These range from a biological perspective to a probabilistic perspective. Neuroscientists are fascinated by neural networks for their computational behaviour. The probabilistic narrative proposes the search for latent variables. Latent variables are those variables that are not directly observed but are rather inferred, through a mathematical model (that is a neural network in this case), from other variables that are observed. These perspectives have helped in a steady growth of the field, partly because of the non-exclusive nature of the field.

The question that we want an answer to is, “What are neural networks?”. Well, neural networks are computational models partially based on the neuron structure in the brain. Various neurons are stacked together to form the neural network. Each neuron is a computational unit which performs operations. What operations you may ask?

Let us take a step back. Neural networks are trying to find patterns. The data at the end of the day is converted to 1’s and 0’s. The field of computer vision deals with images, which are RGB numbers. Natural language processing deals with text, which is represented in the form of word vectors, for a better understanding of the word. Neural networks make an attempt at understanding these feature vectors. When we say, neurons are computational units, we intend on talking about the computation made on the data to understand it better.

Source: Wikipedia

Neural networks are stacks of neurons connected together. Let us assume we use the neural network for a binary classification problem. The two classes must have some distinction for us to classify the two. These distinction points are called the decision boundaries. Decision boundaries are the key. Once the computer understands where the decision boundary lies, it can easily classify. We feed the image as a single-dimensional input vector to the neural network. The neural network through the many layers of neurons raises the dimensionality of the input and finds suitable decision boundaries. The dimensionality is one of the few parameters we can control in a neural network, but understanding the neural network is a challenge that is still open today as a problem. The questions that we ask ourselves after completing an article, for example, “how do you know that you have understood the article”, cannot be asked to a neural network. There are several model evaluation strategies, but they all lead to very little information when compared to the size of the dataset at hand.

Why neural networks?

The end goal we are aiming at is generalisation. Can the model look at a banana which has blackened a bit and yet recognise it as a banana? Traditional computational methods saturate in terms of generalisation. Therefore, neural networks possess this innate property, which makes them the best choice. If these are great and have existed for 50 years, why didn’t we make use of it?

The answer to the above question lies in the above paragraph. To generalise over the massive number of permutations possible, the network needs to learn first that this is a possibility. Therefore, we need huge datasets that cover the majority of the variations possible.

Next comes setting up computation power. Only in the past decade, the cloud has democratised computing power. Before the onset of cloud computing, all such computational ventures were limited to the research folks at universities with loads of funding. Since everything seems to have fit in perfectly, the advent of neural networks today appears more as a consequence than as a choice.

Mathematics in neural networks

Let us understand how neural networks work. Working knowledge of linear algebra is required to understand this section.

Neural networks work on the input data and try to come up with a decision boundary. Take a simple linear programming problem, for instance. Given the constraints, how do we minimise the cost? We come up with equations for the constraints, and plot it on the graph. It results in a decision boundary. The decision boundary works out well here because the number of variables involved is less. If the number of variables increases, then the dimensionality of the decision boundary increases as well. Therefore, for the neural networks with large dimensions of data, it is difficult to understand these boundaries visually. That is where mathematical knowledge helps us.

Source: https://www.youtube.com/watch?v=gbL3vYq3cPk

Forward Pass

Let our input be X. We define a weight matrix W that comprises the weights. During training, these weights get updated.

Initially, we multiply the input X with the weight matrix W to get, X.W, dot stands for the product. We pass this through an activation function that achieves many feats. Let us understand the importance of activation functions.

Helps squash the numbers to a minimum, reducing the computational overhead.
Guarantees faster and efficient learning
Ensures non-linear decision boundaries for neural networks.

Activation functions take the product of X.W as an input and produce a squashed output. Let us consider a few examples of activation functions.

Sigmoid function: f(x) = 1/(1 + e-x)
Tanh function: f(x) = tanh(x) = 2/(1 + e-2x) — 1
Rectified Linear Unit (ReLU): f(x) = max(0,x)

Activation functions introduce non-linearity. The entire process of training is to find the decision boundary. Therefore, we must understand the power and importance of these functions. Using the wrong activation function may cause over-fitting or even under-fitting.

During the forward pass, the output of the first layer becomes the input to the second layer. This happens until the last layer. The output of the last layer is used to calculate the loss function. Loss function, also known as the error function, is used to understand how accurate the model is. Since this is a case of supervised learning, we can compare the output of the model for input with the actual corresponding output. Therefore, the difference in the actual versus predicted gives us an understanding of how off the model is. This error function needs to be minimised to increase the accuracy of the model. Let us consider the backpropagation algorithm in the next section to understand the updation of weights.

All these computations take place during the forward pass. During the backward pass, we compute the loss function and understand which are the weights we need to change and by how much. This is done with the help of an algorithm called back-propagation algorithm. Let us understand this algorithm in detail.

Backward Pass

Backpropagation algorithm: It is an algorithm used to train neural networks. Neural networks are initialised with either random sets of weights, or specifically chosen sets of weights according to various algorithms. An example of this would be He initialization. Backpropagation algorithm helps to adjust the weight matrix so that the error function is minimised. Minimisation of the error function results in higher accuracy, and therefore better generalisation.

Why use backpropagation and not trial and error to find the global minimum of the error function?

Backpropagation is a highly efficient algorithm that searches for the optimal weights using the gradient descent technique. It has an approach that is accurate and works efficiently in low computational environments.

Also Read: Things to Know Before Deep Diving into Neural Networks

Types of neural networks

Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory (LSTM)
Radial Basis functions

Also read: Types of Neural Networks- Comprehensive

1. Convolutional Neural Networks: also known as CNNs, these are massively popular in the field of computer vision. These networks are good at understanding spatial information and can analyse patterns. These networks changed the way we approached deep learning. These are deep networks aimed at classifying hundreds to thousands of classes with more than 98 per cent accuracy. The reason these networks work is due to their robust feature extractors. These feature extractors extract various low-level features like edges, corners, and towards the end of the network end up identifying many high-level features like shapes, sizes, texture, colour etc.

Also Read: Convolutional Neural Network Model Architectures & Applications

2. Recurrent Neural Networks: RNNs have the element of feedback loop in them. This helps add the element of memory to the network. Unlike CNNs, RNNs can remember what they had seen a couple of iterations earlier. This suggests that these can be used in applications where the context affects usage. For example, natural language applications. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Unfortunately, these networks suffer from a problem called vanishing gradient. As soon as the number of layers increases, the training stops. Reason being, the backpropagation algorithm can no longer propagate the gradient information all the way back to the beginning of the network and therefore is resulting in zero updation of the weights. When the weight matrix does not get updated, the results do not change, and therefore performance remains stagnant.

3. Long Short Term Memory: LSTM is the solution to the vanishing gradient problem in RNNs. LSTMs are widely used in time series analysis, text prediction, natural language understanding, etc. The math behind LSTMs is a bit advanced and beyond the scope of this article. I, therefore, recommend the reader to go through other materials on the internet.

4. Radial Basis functions: These networks work on the principle of projecting lower-dimensional data onto a higher-dimensional data space, thereby increasing the likelihood of defining a decision boundary. It is a single-layered neural network that uses kernel functions to project data into higher dimensional spaces.

In this article, we have covered an extensive array of topics. We hope you have got the utmost value by going through the same. We wish you the best of luck in your endeavours in machine learning and artificial intelligence.

Understanding Neural Networks

Written by Srinivas Rao