Understanding the basics of Neural Networks (for beginners)

Let’s understand the magic behind neural networks: Hidden Layers, Activation Functions, Feed Forward and Back Propagation!

Indraneel Dutta Baruah
Geek Culture

--

Anyone working in the field of AI driven data analytics should know about neural networks as it is the building block of the exciting field of deep learning. Although the idea of artificial neural networks (ANN) has been there since 1940s (with Warren McCulloch and Walter Pitts creating the first computational model for neural networks), the recent surge in availability of data and computational power has put neural networks and its variants back into the spotlight.

There are a number of success stories associated with neural networks and the high level of accuracies observed with these models is exciting news for aspiring data scientists. But I have also seen a lot of them getting trapped in the race to use the latest and most complicated deep learning models without knowing how they work. The problem with this approach is that they are not able to understand or explain the output from the model which handicaps their ability to tweak it effectively when the default version fails.

This motivated me to write a series of blogs on how neural networks work, what are the key parameters and how to effectively train them. I will be adding detailed python codes (using both Keras and Pytorch libraries) to help the readers implement the approaches in a separate blog series.

We will start by understanding how a neural network works. The best way to begin is to draw parallels with a biological neural network in an animal brain. That is why each individual component in a neural network is called a neuron and the links between them are called connections (see image 1 below). These models “learn” to perform tasks (like regression or classification) by considering relationship between input variables and target variable, without specifying any predefined rules.There are three key concepts involved in the workings of a neural network:

  1. Layers (Input, hidden and output)
  2. Feed Forward
  3. Backpropagation

Layers (Input, hidden and output)

Image 1: Neural Network Structure

There are three types of layers in neural networks:

  1. Input layer: It takes in the input data for the neural network. It doesn’t apply any computations on the input values and simply passes on the values to the next layer. In the network in image 1, we have 4 input values: x1, x2, x3 and x4.
  2. Hidden layers: These are intermediate layers between input and output layer where all the magic happens! There are a few technical jargons we need to know at this step:
    a. Weights: Each neuron is associated with a weight which represents the strength of the connection between units. Weights close to zero means changing the neuron’s input will not change the output while negative weights mean increasing the input will decrease the output.
    b. Activation function: Each neuron in the hidden and output layer takes the output from the previous layer as input and applies a function on the sum product of weights and inputs. This functions are called activation functions. We need to be very careful while selecting these activation function (more details on how to do so can be found in this blog)
    c. Deep Network: If the neural network has at least two hidden layers, then it is called a deep neural network
    d. Dense Layers: Each neuron in these layers receives input from all neurons of its previous layer

Let’s try to understand these layers better as most of the computations happen here. In the network in image 1, We have 2 hidden layers of depth 6 (i.e there are 6 neurons in each layer). Focusing on the first hidden layer, we can observe that all its neurons will be getting its inputs for all the 4 neurons in the input layer. How do we check this? Just check the number of connections from each neuron in input layer to each neuron in the first hidden layer.
We can then conclude that there will be 4 weights associated with each neuron in the first hidden layer (if a bias term is present then it would be 5). Assuming that the weights for the first neuron in the first hidden layer are w1, w2, w3 and w4, the activation function associated with the neuron will look like:

F((w1*x1) + (w2*x2) + (w3*x3)+ (w4*x4)) = output

Similar to the first neuron in the first hidden layer, the other neurons in the layer will have their own weights and activation functions. The output of the first layer becomes the input for the neurons in the second hidden layer. We can observe that all the hidden layers are dense. Thus, there will be 6 weights associated with each neuron in the second hidden layer. Finally, the output of the second layer becomes the input for the output layer.

3. Output layer: It generates the final output using the output of the final hidden layer as its input. In the network in image 1, the final hidden layer creates 6 outputs from it’s 6 neurons. Thus, there will be 6 weights associated with each neuron in the output layer. It is important to remember that we have to select the appropriate number of neurons and the appropriate activation function in the output layer based on the task (regression vs classification). In our case, we are doing multi-class classification with three classes. So, we can use 3 neurons and probably a softmax activation function.

All this math can be represented in a concise manner using matrix notations if one is comfortable with matrix operations. Choosing the right architecture for the neural network can make all the difference in its performance and we discuss that in detail in this blog.

Feed Forward

Image 2: Feed Forward Neural Network for Classification (Courtesy: Alteryx.com)

Forward propagation is a process of getting the Neural Network output based on a given input. It essentially refers to the process defined in previous section and the output is called the predicted value. Given its inputs from input layer, each neuron in the first hidden layer computes the transformation z = W*x + b (W being the matrix of weights, x being the input matrix and b being bias vector) and then applies an activation function on the transformation. As shown in image 2, this process repeats for subsequent layers and finally the output layer generates a prediction.This predicted value can be then compared with the actual target variable to see how good/bad the prediction was. The variance between the actual and predicted values is called error and we can then define a loss function to calculate this error. Based on the task at hand, there are various loss functions and we will be discussing the optimal loss function for different scenarios in the next blog. The neural network then tries to find the optimal weights by minimising the loss function using Backpropagation (which we discuss next).

Backpropagation

Image 3: Finding the minimum of loss function

As mentioned previously, we get an output value which is the predicted value from feed forward. We calculate how good/bad the prediction was by using a loss function that calculates the difference between prediction and actual value. Then we try to identify the values of the weights associated with all the neurons that minimise this loss function using an optimiser like gradient descent(see image 3). This process is called backpropagation. There are multiple optimisers available to perform back prorogation and we will discuss the pros and cons of the popular ones in one of my future blogs. The model builder can decide how many training iterations to run (called epochs) and at each iteration, backpropagation process keeps updating the weights based on the optimiser selected.

Thus, backpropagation algorithm allows the information to go back from the output layer backward through the network in order to compute the optimum change in weights. It has the following steps:

  1. Calculated the predicted value using feed forward mechanism
  2. Calculate the error using loss function
  3. Based on the optimiser selected, identify the direction and magnitude by which weights can be updated to minimise the loss function (usually this step involves calculating derivatives)
  4. Multiply the optimizer's output by a fraction (called learning rate) as we don’t want rapid changes to the weights which might prevent us from finding the minimum of loss function
  5. Adjust the weights by adding the output from step 4 to the original weights
  6. Repeat step 1–5 for the number of epochs mentioned by the model developer

Conclusion

The aim of this blog was to help readers who are starting to explore deep learning understand how neural networks work. We learnt the following:

  1. What are the different kinds of layers in a neural network and how do they work?
  2. What are activation functions, weights, dense layers and deep networks?
  3. What is feed forward and how does it work?
  4. What are the steps involved in backpropagation? Why do we need it?

The next step is to understand what options a developer has while selecting key components like activation and loss functions and how to pick the right one. Please follow this blog to learn more on this in the next parts!

Do you have any questions or suggestions about this blog? Please feel free to drop in a note.

Thank you for reading!

If you, like me, are passionate about AI, Data Science, or Economics, please feel free to add/follow me on LinkedIn, Github and Medium.

References

  1. A. K. Jain, Jianchang Mao and K. M. Mohiuddin, “Artificial neural networks: a tutorial,” in Computer, vol. 29, no. 3, pp. 31–44, March 1996, doi: 10.1109/2.485891.
  2. Simpson, P K. Foundations of neural networks. United States: N. p., 1994. Web.

--

--

Indraneel Dutta Baruah
Geek Culture

Striving for excellence in solving business problems using AI!