CNNs: The Secret Sauce to AI’s Success (Part II)

Neha Purohit
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
8 min readOct 26, 2023

CNN, which stands for Convolutional Neural Network, is a deep learning model commonly used for tasks like image recognition. It’s composed of layers that help the network learn and extract features from the input data.

The typical structure of a CNN consists of three basic layers

  1. Convolutional layer: These layers generate a feature map by sliding a filter over the input image and recognizing patterns in images.
  2. Pooling layers: These layers downsample the feature map to introduce Translation invariance, which reduces the overfitting of the CNN model.
  3. Fully Connected Dense Layer: This layer contains the same number of units as the number of classes and the output activation function such as “softmax” or “sigmoid”.
Source: Datahackers.rs

Convolutional layers serve as the foundation of Convolutional Neural Networks (CNNs), playing a crucial role in feature extraction. Their primary purpose is to identify patterns and features within the input data, often images. These layers operate by employing small filters or kernels that slide over the input data, performing element-wise multiplications and summations at each position to generate a feature map. Through this process, convolutional layers are adept at learning and recognizing low-level primitive features such as edges, corners, and textures. As the network delves deeper, it becomes increasingly skilled at extracting higher-level features, including shapes and objects, making it a pivotal component in the realm of image recognition and computer vision.

Source: Datahackers.rs

How?

In the initial layers of a CNN, small filters or kernels slide over the input data, such as an image. These filters have specific patterns they are designed to detect. For example, some may be designed to detect horizontal edges, while others may detect vertical or diagonal edges.

In order to understand filters and kernels in detail :

Edge detection filters are essential tools in image processing. The Sobel Filter, for instance, highlights edges by emphasizing intensity changes in both horizontal and vertical directions. Similar in purpose, the Prewitt Filter employs a 3x3 kernel to emphasize edges through convolution. The Canny Edge Detector, on the other hand, is a comprehensive, multi-stage algorithm that incorporates Gaussian smoothing and gradient calculations to detect edges accurately and is commonly used in computer vision applications. These filters play a critical role in identifying the boundaries and edges of objects within images, a fundamental step in many image analysis and computer vision tasks.

Blurring and smoothing kernels serve to enhance image quality and reduce noise. The Gaussian Kernel, for instance, achieves this by implementing a weighted average on neighboring pixels, resulting in a gentle blurring effect. Similarly, the Box Blur Kernel employs a straightforward averaging technique to create a blur in images. On the other hand, the Median Filter takes a different approach by substituting each pixel’s value with the median of nearby pixel values, effectively reducing noise and enhancing image clarity. These techniques are valuable for refining images, reducing noise, and achieving smoother visual results.

Source: Datahackers.rs

Sharpening kernels play a crucial role in emphasizing image details and edges. The Laplacian Kernel, for example, enhances edges and intricate features by highlighting swift changes in intensity within the image. On the other hand, Unsharp Masking is a method that involves applying a subtractive kernel to create a sharpening effect, effectively making edges and fine details more pronounced. These techniques are instrumental in improving image clarity and accentuating important elements within a picture.

Texture detection kernels are valuable tools for examining the textures within images. For instance, the Gabor Filter is employed to study image textures by capturing the spatial frequency traits inherent in them. Similarly, the Local Binary Pattern (LBP) Kernel is utilized in computer vision and pattern recognition for the analysis of textures in images.

These kernels play an essential role in identifying and characterizing diverse textures, a fundamental aspect of image analysis and pattern recognition tasks.

Color filters provide versatile options in image processing. Grayscale in the context of Convolutional Neural Networks (CNNs) refers to the use of single-channel images or feature maps where each pixel is represented by a single intensity value, typically ranging from 0 (black) to 255 (white) in an 8-bit grayscale image.

Grayscale images are devoid of color information, and CNNs work with this single channel of intensity values.

In CNNs, grayscale images are often preferred for certain applications, especially when color information is not relevant or is redundant. Grayscale images reduce the computational load as they involve fewer channels and parameters compared to color (RGB) images, making them more computationally efficient.

For instance, in tasks like facial recognition, where color may not significantly impact the results, using grayscale images can simplify the neural network architecture and reduce memory and processing requirements.

Grayscale CNNs can still effectively detect features like edges and textures, making them a practical choice for various image processing and computer vision tasks, particularly when color information is not essential.

The Red, Green, Blue (RGB) Filters are employed to separate or adjust color channels within images, enabling fine-tuning of color components. Similarly, the Hue, Saturation, and Value (HSV) Filters are valuable for modifying various color attributes in images, offering control over factors like color tone and intensity. Furthermore, custom feature detection kernels can be tailor-made for the specific purpose of recognizing unique features. For instance, they can be designed to identify specific logos or symbols within images, offering a flexible and customized approach to feature recognition.

So, after analyzing the convolution of the images with above mentioned techniques, the process results in feature maps highlighting where these patterns exist. Pooling layers are then used to reduce noise and maintain important information. As the CNN progresses through its layers, it learns more complex features built upon the foundation of low-level features. This hierarchical learning allows it to recognize intricate patterns.

CNNs undergo training to adjust filter weights and connections, minimizing recognition errors and enhancing feature recognition abilities.

Source: Saul Dobilas

This training typically occurs on labeled datasets, helping the network refine its pattern recognition skills.

In conclusion, pooling layers and fully connected layers are integral components of Convolutional Neural Networks (CNNs). Pooling layers efficiently reduce spatial dimensions, enhancing computational efficiency while retaining essential information and robustness. Fully Connected layers, as the final layers of a CNN, play a pivotal role in transforming the extracted features into predictions or classifications, making them especially valuable for tasks like image classification. Together, these layers contribute to the success of CNNs in various applications, particularly in computer vision and deep learning.

Let me introduce one more context here: Padding.

Without padding, several significant consequences occur. The spatial dimensions of feature maps are progressively reduced, potentially leading to a loss of spatial information. “Edge effects” can result in diminished feature processing at the input’s borders. Deeper networks face challenges as receptive fields become very small. Output size becomes unpredictable, complicating network design. Padding is often used to mitigate these issues and preserve spatial information, making it essential in many computer vision tasks. The choice of using padding or not depends on specific task requirements and network architecture.

Source: https://indiantechwarrior.com/convolution-layers-in-convolutional-neural-network/

Padding provides control over spatial dimensions, preserves information at the edges, stabilizes training, and allows for the design of more effective and adaptable networks, making it a valuable tool for various computer vision tasks and deep learning applications.

Two types of padding are commonly used: “valid” (no padding) and “same” (zero padding), these are commonly used terms in Keras library. In “valid” padding, no extra pixels are added around the input data, resulting in smaller feature maps compared to the input. It is called “valid” because the convolution operation only considers positions where the filter fully overlaps with the input data. In contrast, “same” padding involves adding padding so that the feature maps maintain the same dimensions as the input data, typically by adding zeros around the input.

Padding serves to preserve spatial dimensions, prevent excessive feature map reduction, and mitigate edge effects. The choice of padding type depends on the specific task and the network’s architecture in the CNN.

Several mathematical intuitions and concepts play a crucial role. These include understanding convolution and filter operations, applying matrix algebra to feature maps and weights, using non-linear activation functions, and performing operations like pooling and gradient descent. Additionally, mathematical intuition is required for dealing with concepts like loss functions, weight sharing, hierarchical feature learning, probability and statistics, and spatial dimensions management. Choices regarding padding, stride, fully connected layers, regularization, kernel initialization, and optimization algorithms all involve mathematical considerations in the design and training of CNNs.

Stay tuned for more on this topic next week.

My other blogs:

CNNs: The Secret Sauce to AI’s Success (Part I)

Mastering MNIST with ANN: Secret to hand-written digit recognization

Loss Function| The Secret Ingredient to Building High-Performance AI Models

Optimizers: The Secret Sauce of Deep Learning

Activation Functions: The Hidden Heroes of Neural Networks

The Future of Neural Networks May Lie in the Co-existence of Neurons and Perceptrons

Unleashing The Power Of GPT-4: A Quantum Leap In AI Language Models

If you enjoy reading stories like these and want to support my writing, please consider Follow and Like . I’ll cover most deep learning topics in this series.

--

--

Neha Purohit
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

Unleashing potentials 🚀| Illuminating insights📈 | Pioneering Innovations through the power of AI💃🏻