Day 9 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readJun 25, 2020

Fundamental unit of a neural net is present in each layer and is called the node or perceptron. Let us discuss more about the activation function and calculation of input and output variables in case of different algorithms.

What does a node do?
In simplified terms, the node or perceptron takes in an input from the previous layer and gives an output by passing the input through a function. Let us take the following diagram for instance:

In the following diagram, it may be seen that the perceptron is referred to as the processing function which takes in inputs and then passes the value through an activation function which then interprets the output. We will understand the meaning of these terms shortly.

Each perceptron has been given a weight which may be initialized as our neural net and its layers are created and these weights keep modifying their values based on the closeness to the target value as we have seen in the previous blogs (concept of weights and biases). These can be called as the properties of the node which we shall recap as:
1. Each node has its own weight which is pre initialized and keeps changing in each iteration.
2. Each node has its own bias which remains fixed.
3. Each node has its own activation function based on the layer in which it exists.
The diagram given below summarizes the properties of a node:

To make it simple, let us understand and compare the equation with that of a linear equation(or regression). The formula for linear regression is given as:
y=mx+c
Now, let us contrast this with the function of each node which is given as:
Output=activation function(Input x Weight +Bias)

What exactly is an activation function?
An activation function is basically a function that helps us map our input to our output. The formal definition on Wikipedia states:

“In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs.”

There are several different types of activation functions and each of them have a different purpose. We shall briefly go over a few of the activation functions and their use.

  1. Relu: This is the abbreviation for Rectified Linear Unit. In simple terms, it can be explained as:
    If x is the input and if x>0, then output=x or
    if x is the input and if x≤0, then output=0
    This is mainly used in order to validate out output. For instance, if I have a value such as 0.54, then it is rounded to 1 by the activation unit. On the other hand, if my perceptron gives an output of -0.43, it is taken as 0.
    We can relate this to classification because in the case of different classes, putting these values into classes of 0 and 1 helps make it easier to classify.
  2. Sigmoid: I shall try and explain this with the help of an example. Let us consider the probability of growing only one among 3 given fruits i.e. you can choose to grow only one among the three fruits (Apple, Orange and Mango), then we will try to pass this through this type of activation function which basically does the following:
    If input probability is the x and if x>50%, then output=1 or
    If input probability is the x and if x≤50%, then output=0
    Let us visualize with a diagram:

Here, we can see that after the values pass through the activation function, they return discrete values to us which can help us with our classification.

3. Softmax: When you’re creating a neural network for classification, you’re likely trying to solve either a binary or a multiclass classification problem. In the latter case, it’s very likely that the activation function for your final layer is the so-called Softmax activation function, which results in a multiclass probability distribution over your target classes. That's just a gist about softmax.

Why do we use activation functions?
To keep things simple, we use an activation function to add a sort of linearity property to the function in the neural network.

I’m going to talk about some of the points about usage of activation functions and the number of nodes.
1. In case, you are solving a regression problem whereby only a single node is used, there is NO activation function at the output layer.
2. In case, you are solving a binary classification problem, we require only a single node and a sigmoid activation function is used at the output layer.
3. In case of multiclass classification, we use multiple nodes and softmax activation function is used.
4. In case of a classification problem, the number of nodes in the output layer is equivalent to the number of classes in the output or target.
5. In case of a regression model, the number of nodes in the output layer is equivalent to 1.
6. The number of hidden layers and nodes in the hidden layer is usually calculated by trial and error but the number of hidden layers can be estimated by using the following formula:
n=(2/3) x (Nodes in Input layer) +(Nodes in Output layer)
7. When a node in a layer is connected to every node in the previous layer, it is called a fully connected network or a Dense neural network.

Just wanted to cover the essentials. That’s it for today. Keep Learning.

Cheers.

--

--