Convolutional Neural Network (CNN)

4 min readFeb 7, 2024

Introduction

In Deep Learning, Neural Networks are used for Image Recognition and Computer Vision, Natural Language Processing (NLP), Speech Recognition and Synthesis, Recommendation Systems, Generative Modeling etc. Basically, Neural Networks are designed to mimic the human brain.

In Deep Learning the CNN (Convolutional Neural Network) is specially used in Image Recognition, Computer Vision, Pattern Recognition, Image Classification, Object Detection etc.

Why CNN?

We can also do Computer Vision tasks like Image Classification, Image Recognition, etc… using ANN(Artificial Neural Network). But the main reasons why we use CNN over ANN are they are more efficient and scalable. Suppose we have the 100x100 image, If we use the ANN then we will have a total of 10000 parameters, as it is a fully connected neural network. On the other hand, CNNs are specifically designed to efficiently handle image data by leveraging concepts like weight sharing and local connectivity. This allows CNNs to extract spatial hierarchies of features while significantly reducing the number of parameters required. As a result, CNNs not only offer better performance but also require fewer computational resources, making them more suitable for real-world applications in computer vision.

How CNN Works?

Convolution Operation

In Convolution operation, the kernel (filter) is used to extract features from the input image. The kernel (filter) is commonly 3x3 or 5x5 in size. Another important parameter in convolution operation is stride. Stride means how many number of rows and columns to move after one convolution operation. The default stride value is often set to 1, But it can be adjusted based on the desired effect on the output size and feature extraction.

Padding

In CNN the padding is the process to add the extra pixels around the border of the input data. Padding can be added to control the size of the output feature maps and to preserve spatial information, ensuring that important details present at the edges and corners of the input image will not be lost during convolutional operations.

There are few types of padding like Zero Padding, Valid, Same, Causal, Constant, Reflection and Replication. Out of these most popular paddings are Zero and Same.

Pooling

The pooling layer is used to reduce the spatial dimensions of the feature maps and extract important information.

Maxpooling and AveragePooling are two commonly used types of pooling operation. Maxpooling takes the maximum value from each local region and Averagepooling takes the average value from each local region.

Flattening

The flattening is done in order to provide the pooled feature map as an input the fully connected neural network.

Fully Connected Neural Network

The flattening layer’s output will be provided as an input to the Fully Connected Neural Network. Then Fully Connected Neural Network will perform the classification or regression, or any other task the network is designed for.

How to calculate the size of the resultant image after the convolution operation?

It can be calculated using the following equation,

Resultant Feature Map Size = (n + 2p -f)/s +1

Where,

n = Input

p = Padding

f = Filter Size

s = Stride

Challenges and Limitations of CNN

Some common challenges and limitation may be faced in CNN are Overfitting, Vanishing Gradients, Need for Large Datasets, Computation Resources etc…

Conclusion

Convolutional Neural Networks (CNNs) have emerged as indispensable tools in the realm of computer vision, revolutionizing tasks ranging from image recognition to object detection. Their widespread adoption and integration into various pretrained models underscore their versatility and efficacy. With CNNs serving as the backbone of numerous state-of-the-art architectures, their significance in advancing technology and driving innovation cannot be overstated.

References

Recommended Course

https://www.coursera.org/learn/convolutional-neural-networks

Stay tuned for more such blogs!