Dictionary of CNN

5 min readJun 10, 2018

A quick overview of all terms we need to know to build Convolutional Neural Network

What is CNN?

Convolutional Neural Network(CNN) is a type of advanced artificial neural network. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers.

Input->n(convolution ,Relu,Pooling)->fully connected->Softmax

What is Convolutional Layer?

Convolutional Layers have a moving filter also called as the weight matrix. The filter slides over the input image (convolution operation) to produce a feature map.

Weight matrix and Input Image multiply (dot product) and summed to produce the feature map.

Ex: (18 x 1 + 54 x 0 + 51 x 1) + (55 x 0 + 121 x 1 + 75 x 0) + (35 x 1 + 24 x 0 + 204 x 1)

The convolution of another filter (with the green outline), over the same image gives a different feature map

What is an Activation-Function?

These are the functions which help us to decide if we need to activate the node or not. These functions introduce non-linearity in the networks.

So in the convolutional layer, we did like:

output = Input x weight + bias

So we can see it’s in the form “y = mx + c” , So it’s in a straight line form but to get more correct better results we need it to be curved, that’s when activation function come’s into the picture.

Some famous activating functions are :

More on Activation Function

Rectified Linear Unit(ReLU) is mostly preferred because it provides sparsity and a reduces likelihood of vanishing gradient problems.

More about why ReLU

What is Vanishing gradient Problem?

Neural Networks often have thousands of hidden layers. We train them using Back-propagation i.e moving backward in the Network and calculating gradients of loss with respect to the weights.

Now the gradients tend to get smaller and smaller as we keep on moving backward in the Network. As a result, it causes a very small change in the earlier layers and this keeps on falling and earlier layers can’t be trained well and earlier layers are very important, It raises a big problem. ReLU is used to overcome this issue.

More on Vanishing Gradient Problem

What are Strides?

Stride decide how our weight matrix should move in the input, i.e jumping one step or two.

In the example, we can see The weight matrix move by two places.

What is Padding?

The edge pixels or the number in the input matrix are less used compared to the inside pixels/numbers. So their contribution is less to form the output. In order to make them equal, we use padding. Padding is of two types “SAME”, “VALID”.In SAME padding we add extra zeros to the edges of the input matrix. Whereas in VALID padding we don’t add any extra rows or columns.

More on Padding

What is Max Pooling layer?

We move a window across a 2D input space, where the maximum value within that window is the output.

It is also called as downsampling layer because it reduces the number of parameters within the model.

More on Max-Pooling Layer

What is dropout layer?

Dropout refers to dropping(not considering in both forward and backward pass) some neurons during the training phase. The neurons which are chosen at random.

More on Dropout

What is Fully Connected Layer?

In Fully Connected layer all neurons are connected to all neurons of the previous layer. After feature extraction we need to classify the data into various classes, this can be done using a fully connected neural network.

The has to flatten and connected to the output layer, that’s what fully connected does, it flattens. It is also called the dense layer.

More on Fully Connected Layer

What is Softmax Layer?

Softmax Layer returns the probabilities of each class and the class with higher probability will be the target class.

This function calculates the probabilities of each target class over all possible target classes.

More on Softmax Function

What is the Learning Rate?

This is a parameter which decides how far the weights will move in the direction of the gradient. Intuitively it is like how quickly a network abandons old beliefs for new ones. If it is too small training will be slow, and If it is too large it may diverge.

How to choose the learning rate?

More on Learning Rate

What is Hyper-Parameter?

Hyperparameters are usually declared before the training process begins. These are variables which determine the network structure.

Ex: Filter Sizes, Number of Filters

More on Hyper Parameters

Thank You!

Next tutorial will be on How to build the layers in TensorFlow.

Dictionary of CNN

Written by Bibhu Pala