DLOA (Part-12)-Convolutional Neural Networks (CNNs)

8 min readMay 8, 2023

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. In the previous blog, we briefly implemented artificial neural networks on the MNIST dataset. If you didn’t read that you can go through this link. In this blog, we’ll be discussing Convolutional Neural Networks(CNNs).

What is CNN?

Convolutional Neural Networks (CNNs) are a type of deep learning algorithm used for image classification, object detection, and other tasks involving image analysis. CNNs are based on the concept of convolution, which is a mathematical operation that applies a filter to an input image to extract certain features.

CNNs have several layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers perform the convolution operation and extract features from the input image. The pooling layers reduce the dimensionality of the feature maps and make the computation more efficient. The fully connected layers perform the final classification or regression task.

Why should we use CNN?

Convolutional Neural Networks (CNNs) are particularly well-suited for handling image data, and they have several advantages over other types of neural networks:

Parameter Sharing: One of the key features of CNNs is parameter sharing, which means that the same set of weights is used for different parts of the input image. This allows the network to learn features that are invariant to translation, rotation, and other transformations.
Spatial Invariance: CNNs are also designed to be spatially invariant, which means that they can recognize features regardless of their location in the image. This is achieved through the use of pooling layers, which downsample the feature maps and reduce their spatial resolution.
Hierarchical Features: CNNs are able to learn hierarchical features from the input image. Early layers in the network learn simple features such as edges and textures, while later layers learn more complex features such as shapes and objects.
High Performance: CNNs have achieved state-of-the-art performance on a wide range of image recognition tasks, including object detection, segmentation, and classification.
Transfer Learning: CNNs trained on large datasets can be used as a starting point for other tasks, allowing for faster and more efficient training. This is known as transfer learning, and it has been shown to be particularly effective for smaller datasets.
Robustness: CNNs are often more robust to variations in the input data than other types of neural networks. For example, they can handle images with different lighting conditions, occlusions, and distortions.

In summary, CNNs are an effective and powerful tool for handling image data. They are able to learn hierarchical features, are robust to variations in the input data, and can be used for a wide range of image recognition tasks. They also have the advantage of transfer learning, allowing them to be used as a starting point for other tasks.

Features of Convolutional Neural Networks(CNNs)

Here are some of the key features of Convolutional Neural Networks (CNNs):

Convolutional Layers: CNNs are designed to effectively handle image data, and convolutional layers are a key component of this. Convolutional layers apply a set of filters (also called kernels) to the input image, allowing the network to learn important features such as edges, textures, and shapes.
Pooling Layers: Pooling layers are used to downsample the feature maps produced by the convolutional layers. This reduces the spatial resolution of the feature maps and makes the computation more efficient. Common pooling techniques include max pooling and average pooling.
Activation Functions: Activation functions are applied to the output of each convolutional layer to introduce non-linearity into the model. Common activation functions include ReLU, sigmoid, and tanh.
Strides: Strides are used in convolutional layers to determine the amount by which the kernel moves over the input image. Larger stride values lead to downsampling and smaller output feature maps.
Padding: Padding is used to preserve the spatial resolution of the feature maps. By adding zeros around the edges of the input image, the output feature maps can have the same dimensions as the input.
Fully Connected Layers: Fully connected layers are used to perform the final classification or regression task. These layers take the output of the convolutional and pooling layers and transform it into the desired output format (e.g. class probabilities).
Dropout: Dropout is a regularization technique that is commonly used in CNNs to prevent overfitting. Dropout randomly removes some of the units in the network during training, forcing the remaining units to learn more robust representations.
Transfer Learning: CNNs can be trained on large datasets and then used as a starting point for other tasks. This is known as transfer learning, and it can greatly improve the performance of a model on smaller datasets.
Data Augmentation: Data augmentation is another technique used to improve the performance of CNNs on smaller datasets. It involves applying transformations such as rotations, flips, and shifts to the training data, creating new examples that are still representative of the original data.

In summary, CNNs are characterized by their use of convolutional layers, pooling layers, activation functions, and fully connected layers. They also often make use of techniques such as dropout, transfer learning, and data augmentation to improve their performance.

CNNs Architecture

The architecture of a Convolutional Neural Network (CNN) typically consists of several layers:

Input Layer: The input layer takes the raw input image and passes it to the first convolutional layer.
Convolutional Layers: Convolutional layers apply a set of filters (also called kernels) to the input image, producing feature maps that highlight important features such as edges, textures, and shapes. Each filter is responsible for detecting a specific feature in the image.
Activation Function: After each convolutional layer, an activation function (such as ReLU or sigmoid) is applied element-wise to the feature maps, introducing non-linearity to the model.
Pooling Layers: Pooling layers reduce the dimensionality of the feature maps by taking the maximum or average value in each local region. This makes the computation more efficient and helps to avoid overfitting.
Fully Connected Layers: After several convolutional and pooling layers, the output is flattened and passed through one or more fully connected layers, which perform the final classification or regression task.
Output Layer: The output layer produces the final prediction for the task. For example, in image classification, the output layer would have one neuron for each class, with the predicted class being the neuron with the highest output value.

The architecture of a CNN can vary depending on the specific task and dataset. For example, some CNNs may use skip connections, which allow information to bypass certain layers and improve gradient flow during training. Others may use residual blocks, which allow for deeper architectures and better performance. Additionally, some CNNs may use techniques such as batch normalization or dropout to improve training and prevent overfitting.

In summary, the architecture of a CNN typically consists of convolutional layers, activation functions, pooling layers, fully connected layers, and an output layer. The specific architecture can vary depending on the task and dataset and may include techniques such as skip connections or residual blocks to improve performance.

Different Types of CNNs

There are several different types of Convolutional Neural Networks (CNNs), each with its own specific architecture and use case. Here are some of the most common types:

LeNet-5: LeNet-5 was one of the first CNN architectures, developed in the 1990s by Yann LeCun. It was designed for handwritten digit recognition and consists of two convolutional layers, followed by two fully connected layers.

AlexNet: AlexNet is a CNN architecture developed by Alex Krizhevsky in 2012. It was the first CNN to win the ImageNet Large Scale Visual Recognition Challenge, and it consists of five convolutional layers, followed by three fully connected layers.

VGGNet: VGGNet is a CNN architecture developed by the Visual Geometry Group at the University of Oxford. It is characterized by its use of very small convolutional filters (3x3) and a deep architecture, with up to 19 layers.

GoogLeNet (Inception): GoogLeNet, also known as Inception, is a CNN architecture developed by Google in 2014. It consists of a deep architecture with multiple inception modules, which allow for the combination of filters at different scales.

ResNet: ResNet is a CNN architecture developed by Microsoft in 2015. It is characterized by its use of residual connections, which allow for the training of very deep networks (up to hundreds of layers).

MobileNet: MobileNet is a CNN architecture developed by Google in 2017. It is designed for mobile and embedded devices and uses depthwise separable convolutions to reduce the number of parameters and computation required.

DenseNet: DenseNet is a CNN architecture developed by the Computer Vision Group at Cornell University in 2017. It is characterized by its use of dense connections, which connect all layers to each other in a feedforward fashion.

These are just a few examples of the many different types of CNNs that have been developed over the years. Each architecture has its own strengths and weaknesses and is suited to different types of tasks and datasets.

That’s it for now….I hope you liked my blog and learned about CNN, its architecture, its features, and its different types.

In the next blog, I will be taking some examples of different types of CNNs which we have discussed above.

Till then Stay tuned for the next blog…

***Next Blog***