Dictionary of CNN
A quick overview of all terms we need to know to build Convolutional Neural Network
What is CNN?
Convolutional Neural Network(CNN) is a type of advanced artificial neural network. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers.
What is Convolutional Layer?
Convolutional Layers have a moving filter also called as the weight matrix. The filter slides over the input image (convolution operation) to produce a feature map.
Weight matrix and Input Image multiply (dot product) and summed to produce the feature map.
Ex: (18 x 1 + 54 x 0 + 51 x 1) + (55 x 0 + 121 x 1 + 75 x 0) + (35 x 1 + 24 x 0 + 204 x 1)
The convolution of another filter (with the green outline), over the same image gives a different feature map
What is an Activation-Function?
These are the functions which help us to decide if we need to activate the node or not. These functions introduce non-linearity in the networks.
So in the convolutional layer, we did like:
output = Input x weight + bias
So we can see it’s in the form “y = mx + c” , So it’s in a straight line form but to get more correct better results we need it to be curved, that’s when activation function come’s into the picture.
Some famous activating functions are :
Rectified Linear Unit(ReLU) is mostly preferred because it provides sparsity and a reduces likelihood of vanishing gradient problems.
What is Vanishing gradient Problem?
Neural Networks often have thousands of hidden layers. We train them using Back-propagation i.e moving backward in the Network and calculating gradients of loss with respect to the weights.
Now the gradients tend to get smaller and smaller as we keep on moving backward in the Network. As a result, it causes a very small change in the earlier layers and this keeps on falling and earlier layers can’t be trained well and earlier layers are very important, It raises a big problem. ReLU is used to overcome this issue.
More on Vanishing Gradient Problem
What are Strides?
Stride decide how our weight matrix should move in the input, i.e jumping one step or two.
In the example, we can see The weight matrix move by two places.
What is Padding?
The edge pixels or the number in the input matrix are less used compared to the inside pixels/numbers. So their contribution is less to form the output. In order to make them equal, we use padding. Padding is of two types “SAME”, “VALID”.In SAME padding we add extra zeros to the edges of the input matrix. Whereas in VALID padding we don’t add any extra rows or columns.
What is Max Pooling layer?
We move a window across a 2D input space, where the maximum value within that window is the output.
It is also called as downsampling layer because it reduces the number of parameters within the model.
What is dropout layer?
Dropout refers to dropping(not considering in both forward and backward pass) some neurons during the training phase. The neurons which are chosen at random.
What is Fully Connected Layer?
In Fully Connected layer all neurons are connected to all neurons of the previous layer. After feature extraction we need to classify the data into various classes, this can be done using a fully connected neural network.
The has to flatten and connected to the output layer, that’s what fully connected does, it flattens. It is also called the dense layer.
What is Softmax Layer?
Softmax Layer returns the probabilities of each class and the class with higher probability will be the target class.
This function calculates the probabilities of each target class over all possible target classes.
What is the Learning Rate?
This is a parameter which decides how far the weights will move in the direction of the gradient. Intuitively it is like how quickly a network abandons old beliefs for new ones. If it is too small training will be slow, and If it is too large it may diverge.
How to choose the learning rate?
What is Hyper-Parameter?
Hyperparameters are usually declared before the training process begins. These are variables which determine the network structure.
Ex: Filter Sizes, Number of Filters
Thank You!
Next tutorial will be on How to build the layers in TensorFlow.