One of the main parts of Neural Networks is Convolutional neural networks (CNN). CNNs use image recognition and classification in order to detect objects, recognize faces, etc. They are made up of neurons with learnable weights and biases. Each specific neuron receives numerous inputs and then takes a weighted sum over them, where it passes it through an activation function and responds back with an output.
CNNs are primarily used to classify images, cluster them by similarities, and then perform object recognition. Many algorithms using CNNs can identify faces, street signs, animals, etc.
How do CNNs work ?
They are prompt by volume and utilize multi-channeled images. As opposed to flat images that humans can see that only have width and height, CNNs cannot recognize that. Due to digital color images having red-blue-green (RGB) encoding, CNNs mix those three colors to produce the color spectrum humans perceive.
DDI Editor's Pick: 5 Machine Learning Books That Turn You from Novice to Expert - Data Driven…
The booming growth in the Machine Learning industry has brought renewed interest in people about Artificial…
A convolutional network ingests such images as three separate strata of color stacked one on top of the other. A normal color image is seen as a rectangular box whose width and height are measured by the number of pixels from those dimensions. The depth layers in the three layers of colours(RGB) interpreted by CNNs are referred to as channels.
The first layer in a CNN network is the CONVOLUTIONAL LAYER, which is the core building block and does most of the computational heavy lifting. Data or imaged is convolved using filters or kernels. Filters are small units that we apply across the data through a sliding window. The depth of the image is the same as the input, for a color image that RGB value of depth is 4, a filter of depth 4 would also be applied to it. This process involves taking the element-wise product of filters in the image and then summing those specific values for every sliding action. The output of a convolution that has a 3d filter with color would be a 2d matrix.
Now, the best way to explain a convolutional layer is to imagine a flashlight that is shining over the top left of the image. In order to understand how this works, imagine as if a flashlight shines its light and covers a 5 x 5 area. And now, let’s imagine this flashlight sliding across all the areas of the input image. This flashlight is called a filter(or sometimes referred to as a neuron or a kernel) and the region that it is shining over is called the receptive field. This filter is also an array of numbers (the numbers are called weights or parameters).
Second is the ACTIVATION LAYER which applies the ReLu (Rectified Linear Unit), in this step we apply the rectifier function to increase non-linearity in the CNN. Images are made of different objects that are not linear to each other.
Third, is the POOLING LAYER, which involves downsampling of features. It is applied through every layer in the 3d volume. Typically there are hyperparameters within this layer:
- The dimension of spatial extent: which is the value of n which we can take N cross and feature representation and map to a single value
- Stride: which is how many features the sliding window skips along the width and height
A common POOLING LAYER uses a 2 cross 2 max filter with a stride of 2, this is a non-overlapping filter. A max filter would return the max value in the features within the region. Example of max pooling would be when there is 26 across 26 across 32 volume, now by using a max pool layer that has 2 cross 2 filters and astride of 2, the volume would then be reduced to 13 crosses, 13 crosses 32 feature map.
Lastly, is the FULLY CONNECTED LAYER, which involves Flattening. This involves transforming the entire pooled feature map matrix into a single column which is then fed to the neural network for processing. With the fully connected layers, we combined these features together to create a model. Finally, we have an activation function such as softmax or sigmoid to classify the output.
- Each input image will pass it through a series of convolution layers with filters
- In order to perceive the same as humans, CNNs have digital colour images that have red-blue-green (RGB) encoding
- There is a Convolutional Layer, Activation Layer, Pooling Layer, and Fully Connected Layer, these are all interconnected so that CNNs can process and perceive data in order to classify images
Thank you for reading my article, if you enjoyed the read, please clap and comment any feedback you have below. If you want to reach out to me, you can connect with me through LinkedIn.