Introduction to Neural Networks and Activation Functions

Chandra Reddy
6 min readJun 19, 2020

--

DS With Reddy😎

In this Blog we will Cover:

  1. Neural Network: Simple neural network and their working
  2. Activation Function: Introduction and three widely used activation functions:

2.a — Sigmoid function

2.b — ReLu (Rectified Linear Unit)

2.c — Leaky ReLu

So, Let’s begin.

NEURAL NETWORK:

When human brains were analysed it was found that it has billions of neurons connected with each other in a mesh like structure. So, what is neuron? A neuron may as well be called the building block of any Deep Learning algorithm. Since with deep learning we are trying to replicate it so in simple terms, neuron is also a building block of any deep learning algorithm.

So, Here’s a picture of a human brain neuron.

Google Images

This structure of human neuron has Dendrites works as input, Nucleus as a processing area and Axon as output. As I said neural networks are also inspired by this which is the below image.

Google Images

The basic structure of neuron consists of:

  1. One or many inputs i.e. x
  2. One processing area.
  3. And one or more output i.e. y

The depiction of a single neuron is as mentioned below:

This is one neuron with following elements:

  1. Input: x1 and x2 with their respective weights as w1 and w2. We also have a biased value i.e. b
  2. Output: a
  3. Processing: addition of all input with biased term b after multiplying them with their respective weights.

Here we have two more terms weights and biased term b why do we need them? In neural networks some inputs have more importance for getting a precise and accurate output. So we use weights to specify which input should have more impact. First we initialize them randomly and then it’s gets updated to minimize the error. And we use biased term to indicate the quantity of output to be activated irrespective of the inputs.

Hope you understand the basic structure of a neuron. Now the question is how single neuron works?

Before answering that we should familiarize ourselves with activation functions.

What is activation function? Why do we use them?

So, working is similar no matter how complex structure you have for the sake of simplicity I am taking neuron with one input and one output.

Here we have input x1 having a weight w1 and biased term b and when we put all the values we get a = 0.584

Here we have f(z): It is Activation function named as sigmoid function. We will talk about it later. But first we will see what activation function means.

ACTIVATION FUNCTIONS:

Activation Function decides whether a neuron should be activated or not. In other words, it is used to decide the input provided by the neuron is relevant or not.

NOTE: Neural Network without activation function is just a Linear Regression model.

Activation function performs linear transformation to the input enabling the neural network to learn and perform complex task like image processing and language translation.

We have seen one activation function i.e. sigmoid function we also have other type of activation functions as listed below:

  1. Binary Step Function
  2. Linear Function
  3. Sigmoid Function
  4. Hyperbolic Tangent Function
  5. ReLu (Rectified Linear Unit)
  6. Leaky ReLu
  7. Softmax Function

Rather than studying about all the functions, we are going to discuss some highly used functions like Sigmoid Function, ReLu and Leaky ReLu we will discuss them in brief along with their advantages and disadvantages.

SIGMOID FUNCTION:

This function gives output in the range of 0 to 1. So, if we have very large positive number than output will be close to 1. And if we have very large negative number then output will be close to 0.

Mathematically sigmoid function is shown as below.

Google Images

Here Z = ∑i=1 to n xn.wn + bn

Where x is input w is weights and b is biased terms.

The graphical representation of sigmoid function is as follows.

Google Images

From above graph you can see sigmoid activation function makes output between 1 to 0. Also you can see when X-axis is between -4 to 4 values for y changes rapidly which means small change in X from -4 to 4 reflects major change in output (i.e. Y).

In Linear activation function the output may go from –inf to +inf but here it has a range of 0 to 1.

It sounds good activation function but has its own limitation. Here in sigmoid function we have values between 0 to 1 now and value for y is approximately zero when not between -4 to 4. This means that when we multiply these outputs the final output becomes very small — this problem is known as vanishing gradient. And this is the main reason we opt for other activation functions over sigmoid function like ReLu.

So, how does ReLu resolves problems arising with sigmoid function? For this we are going to take a deep look on ReLu Activation Function.

ReLu (Rectified Linear Unit) Activation Function:

I simple words what this function does is it replaces negative values to zero.

Google Images

Here from this graph we can see any negative value is turned to zero. This function and is derivatives both are monotonic. It has Range of [0, infinity).

Mathematical expression for this function is:

Y = max (0, x)

But it has small problem with this as it is converting all negative value to zero. Means if you have lot of neurons with negative output then relu will make them dead neuron and you will have lot of dead neurons with you.

Means, you can end up with a neural network that never learns if the neurons are not activated at the start. You may have lots of dead ReLu without even knowing. But it is useful while making max pooling layer in CNN (Convolution Neural Network).

To overcome this, we have Leaky-ReLu which takes care of this problem and make neural network more responsive.

LEAKY ReLu Activation Function:

In this instead of converting negative neurons to zero we gives them small negative value. As shown below.

Google Images

Here we are changing value to 0.01x if it is less than zero otherwise as it is. So in this way we overcome problem created by ReLu here every neuron will get to learn at the start.

I think the main disadvantage of Leaky ReLU is that you have another parameter to tune, the slope.

NOTE:

We have discussed three activation functions here sigmoid, ReLu, Leaky ReLu. The question that arises is which activation function is the best to use. The result a activation function varies on a case to case bases. In some condition sigmoid is good in some ReLu and so on… So at last it depends on case to case which activation will give good result.

--

--

Chandra Reddy

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning (DS With Reddy😎)