Layers of a Convolutional Neural Network

Published in

Analytics Vidhya

4 min readMar 2, 2020

If we want machines to think, we need to teach them to see.
- Fei Fei Li, AI Researcher at Stanford University

In the previous chapter of this series, we presented a simple introduction of the Convolutional Neural Network (CNN) which forms the basic building block of most if not all Computer Vision algorithms. In this chapter, we will be introducing the vital layers which constitute our every day CNN.

Outline of different layers of a CNN [4]

Convolutional Layer

The most crucial function of a convolutional layer is to transform the input data using a group of connected neurons from the previous layer. It computes a dot product between the region of neurons in the input layer and the locally connected weights in the output layer. This provides the final output volume for the layer. The feat is achieved by a concept known as convolution.

Convolution

It is a mathematical operation which specifies the nature in which two sets of information are combined together. The operation is also known as the feature detector of a CNN where it applies a convolution kernel to the input and returns a feature map as output. This is achieved by sliding the kernel across the input data and multiplying kernel with the segment of data within its bounds to create a single entry in the feature map. Finally, the activation map of each filter is stacked together along the depth dimension to construct the 3D output volume [1]. Like any other neural network model, the parameter optimisation is performed using gradient descent. The major components of the convolutional layer are as follows:

Filters: These are one of the CNN architecture parameters which learn to produce the strongest activation to spatially local input patterns i.e. they will be activated only when the pattern occurs in the training data. With increasing depth of CNN, it is observed that the filters are able to identify the nonlinear combination of features.
Activation Maps: These are computed by sliding each filter across the spatial dimensions of the input volume during a forward pass of information through CNN. A numerical value is obtained if a neuron decides to pass the information through.

Hyperparameters

These dictate the spatial arrangement and size of the output volume from a convolutional layer. Following are some of the most important hyperparameters:

Filter size: It is generally spatially small and possesses three dimensions — width, height and colour channels.
Output depth: This controls the number of neurons in the convolutional layer which are connected to the same region in the input volume.
Stride: This defines the sliding pace of filter per application. The depth of output volume is inversely proportional to the stride value.
Zero-padding: It determines the spatial size of output volume and is quite useful when maintenance of input volume spatial size is preferred in the output volume.

Pooling Layer

The layer helps to progressively reduce the spatial size of the data representation and thus prevent overfitting on the training data. These are generally incorporated between successive convolutional layers and use the max operation to resize the input data spatially. Pooling layers do not have any learnable parameters and generally have zero-padding.

Fully Connected Layer

This layer acts as the output layer for the network and has the output volume dimension as [1 x 1 x N] where N is the number of output classes to be evaluated. Fully connected layers have general neural network layer parameters and hyperparameters.

In this article, we discussed different types of layers — Convolutional layer, Pooling layer and Fully Connected layer of a Convolutional Neural Network stating the importance and utility of each. This concludes our section on CNNs. In further chapters of this series will be focusing on different varieties of Artificial Neural Networks like Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks which have their unique applications in Natural Language Processing and Translation domains.

[1] Tan, Y.H. and Chan, C.S., 2019. Phrase-based image caption generator with hierarchical LSTM network. Neurocomputing, 333, pp.86–100.

[2] Patterson, J., 2017. Deep Learning. 1st edition. ed.: O’Reilly Media, Inc.

[3] Boden, M., 2002. A guide to recurrent neural networks and backpropagation. the Dallas project.

[4] Brilliant.org. (2020). Convolutional Neural Network | Brilliant Math & Science Wiki. [online] Available at: https://brilliant.org/wiki/convolutional-neural-network [Accessed 29 Feb. 2020].