# Introduction to Neural Networks — Part 1

Neural Networks have become a huge hit in the recent Machine Learning craze due to their significantly better performance than traditional Machine Learning algorithms in many cases. The art and science of **Deep Learning **is built on the foundation of Neural Networks and how they work. Hence demystifying Neural Networks is going to be the first step in demystifying Deep Learning. Let’s dive in!

# What is a Neural Network?

How do we define a Neural Network? It is essentially a naive implementation of how our brains might work. It’s not a very accurate representation but it tries to replicate some of the methods our brain uses to learn from it’s mistakes. Let’s look at how our brains work from a simplified perspective and then compare it with a Neural Network.

The brain is essentially a bunch of neurons connected to each other in a huge interconnected network. There are a lot of neurons and even more connections. These neurons pass a small amount of electrical charge to each other as a way to transmit information. Another important feature of these neural connections is that the connection between two neurons can be vary between **strong **and **weak.** A strong connection allows more charge to flow between them and a weak one allows lesser. A neuron pathway which frequently transmits charge will eventually become a **strong pathway.**

Now as the brain takes input from any external source, let’s say for example we touch a hot pan. The nerves from our hand transmits info to certain neurons in our brain. Now there is a pathway from these neurons to the neurons which control our hand. And in these cases our brain has **learnt** that the best option is to move our hand from the pan ASAP. Hence this certain **pathway** between the neurons taking input from the hand and the neurons controlling the hand will be **strong.** This is a highly simplified explanation and doesn’t fully portray what’s going on, but it's a good enough example to explain the basic concept.

Now let’s understand how a Neural Network is represented. A neural network consists of many **Nodes **(Neurons) in many **layers. **Each layer can have any number of nodes and a neural network can have any number of layers. Let’s have a closer look at a couple of layers.

Now as you can see, there are many interconnections between both the layers. These interconnections exist between each node in the first layer with each and every node in the second layer. These are also called the **weights** and we will now go through how exactly they work.

Here we take the example of what’s going on with a single node in the network. The formula for calculating the value of the node is:

Y is the final value of the output neuron.

W represents the weights between the nodes in the previous layer and the output neuron. Here there are three of them. (i = 1, 2, 3)

X represents the values of the nodes of the previous layer.

B represents bias, which is an additional value present for each neuron. Bias is essentially a weight without an input term. It’s useful for having an extra bit of adjustability which is not dependant on previous layer.

f() is called an Activation function and it is something we as the neural network designer will choose. We will go through it’s importance later.

Now let’s look at a complete neural network.

This is a small neural network of four layers. The input layer is where we feed our external stimulus, or basically the input features of the training data. The output layer is where we are supposed to get the target variable, essentially the value that we are supposed to predict. When we feed the inputs into the first layer, the values of the nodes will be calculated layer by layer using the formula above till we get the final value at the output neuron. That is how we get an **output** from a neural network.

# Why do we need an Activation Function?

Even though our neural network has a very complex configuration of weights, it will not be able to solve a problem without the activation function. The reason for this lies in the concept of **Non Linearity.**

Let’s revise what linearity and non linearity means.

The above equation represents a **linear relationship **between Y and X1,X2. Regardless of what values W1 and W2 have, at the end of the day the change of value of X1 and X2 will result in a **linear **change in Y. Now if we look at real world data we realize this is actually not desirable because data often has **non linear **relationships between the input and output variables.

The above diagram represents a typical dataset which shows a non-linear relationship between X and Y. If we try to fit a linear relationship on the data, we will end up with the **red line, **which is not a very accurate representation of the data. However if our relationship can be **non linear**, we are able to get the green line, which is much better.

Now let’s compare the neural network equation **with and without the activation function.**

We can observe that in this equation, there exists a **linear relationship **between the input and the output. However in the case of the equation **with activation function**, we can say that the relationship between input and output can be non linear, IF the activation function is** itself non linear**. Hence all we have to do is keep some non linear function as the activation function for each neuron and our neural network is now **capable** of fitting on non linear data.

Let’s look at a couple of popular activation functions:

**ReLU: **ReLU stands for Rectified Linear Unit. It essentially becomes an identity function (y = x) when x ≥ 0 and becomes 0 when x < 0. This is a very widely used activation function because its a nonlinear function and it is very simple.

**Sigmoid: **Sigmoid is essentially a function bounded between 0 and 1. It will become 0 for values which are very negative and 1 for values which are very positive. Hence this function *squishes *values which are very high or very low to values between 0 and 1. This is useful in neural networks sometimes to ensure values aren’t extremely high or low. This function is usually used at the last layer when we need values which are binary (0 or 1).

This concludes this part of the tutorial. The next part will explain in detail how exactly we can use our data to train our neural network. Thank you for reading!

Link to next part: