Usability of Activation Function on Deep Learning
Explanation of 3 common activation functions on Deep learning by usability.
Why Activation Function so Important?
We can imagine activation Function is a thing that firing our brain (in this case neuron) to think. Maybe that illustration makes you more confuse :P
Anyway.. Without activation Function every calculation in each layer doesn’t have a meaning. Why? because the calculation is linear, which is input value has the same value with output value, implicitly. Activation function makes this is not(n) linear anymore.
Sigmoid
Sigmoid function is used together with binary_crossentropy for loss function. And we used this on final or output Layer.
As we can see on the image above, Sigmoid will produce value between 0 and 1. If the (x) value is negatif, so the return (y) will be near to 0. If positif will be near to 1. This behavior makes Sigmoid better use for model with 2 labels.
model.add(Dense(1))
model.add(Activation('sigmoid'))model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy'])
Better use Sigmoid when your model consists of 2 labels. So that can produce 1 output, which is 0 or 1.
Softmax
Usually used for model that consists of more than 2 labels. And produce the same number as the size of labels. This Activation function for categorical type. That’s why it’s common using softmax with categorical_crossentropy for loss function. And we used this on final or output Layer.
model.add(Dense(4))
model.add(Activation('softmax'))model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])
We call this probability activation function. Because let’s say we have these values [2.0, 1.0, 0.1] then those values will be convert to probability values[0.7, 0.2, 0.1]. If we sum the probabilities then the result must be 1.
Relu
Popular activation function on Deep Learning. Usually we used Relu on Input Layer input and hidden layer.
The result of Relu activation function is always positive. Because when the (x) value is negative, so that the (y) value will be 0. And if positive, relu never change the value.
So then if the result is 0 then this value never fired, this means this value never pass forward to the next layer.
Another reason why Relu is common used in hidden layer is because the calculation is so fast.
model.add(Conv2D(32, (5,5)))
model.add(Activation('relu'))