Convolutional Neural Networks

ANISH VARGHESE (RA2011026010020)
DataX Journal
Published in
4 min readJul 10, 2022
Pictorial Representation of Working of a Convolutional Neural Networks

Now, let's begin simply with the question……. What exactly is a convolutional neural network or CNN in short?.

CNN is a type of Artificial Neural Network which is often associated with image classification and processing. Convolutional neural networks are composed of multiple layers of artificial neurons(perceptrons). At its base, it's just a Deep Learning algorithm that can process and assign weights/importance of each pixel of an image to process and differentiate each object in the image. It is also known as shift invariant or space invariant artificial neural networks.

HISTORY.

Types of CNN and their evolution throughout the history

Convolutional neural networks, also called ConvNets, were first introduced in the 1980s by Yann LeCun, a computer science researcher. LeCun had built on the work done by Kunihiko Fukushima, a Japanese scientist who had already introduced the “neocognitron” which was the origin point for the framework of CNN. The neocognitron itself was inspired by the work of Hubel and Wiesel in which they discovered the working /response of neurons in the cat visual cortices. The earlier development builds of CNN were called LeNet after LeCun, it could recognize only handwritten digits. Convolutional neural networks were presented at the Neural Information Processing Workshop in 1987, automatically analyzing time-varying signals by replacing learned multiplication with convolution in time, and demonstrated for speech recognition although CNN had a lot of potentials it didn’t exactly blow up as the next best thing because CNN demanded a lot of data and resources to work efficiently for large images/high-resolution images. Although CNN's were invented in the 1980s, their breakthrough in the 2000s required fast implementations on graphics processing units (GPUs).

Working

The convolutional neural networks are made up of two main layers namely the convolutional layer and the pooling layer where the former is the major portion and the latter is minor, these layers perform various operations over the dataset to identify and classify.

CONVOLUTIONAL LAYER:

Before learning what this layer does, we got to familiarise ourselves with the term “kernel”. The kernel is a matrix on which the operations are being performed, Although spacially(height and width)a kernel is smaller than an actual image the depth in a kernel extends to all channels. kernel slides across the dimensions of the image to produce a 2D of that region(receptive region)produces a two-dimensional representation of the image known as an activation map that gives the response of the kernel at each spatial position of the image. The sliding size of the kernel is called a stride.The objective of the Convolution Operation is to extract the high-level features such as edges, from the input image. ConvNets need not be limited to only one Convolutional Layer.

The result of the convolution operation is called as padding, which is of two types. The first type is valid padding where the features which are convoluted are reduced in dimensions than the input. The second type is same padding where the convoluted features are either increased in dimensions or remains the same.

POOLING LAYER:

The pooling layer helps in reducing the complexity of the convoluted feature by reducing the spatial sie and thus it helps in drastically improves the performance efficiency of processing the data(dimensionality reduction). Hence it would reduce the requirement of high computational power.

Pooling can be used for much more than just reducing spacial size, it can be used for identifying and distinguishing various features which are rotational and positional variant. There are two types of pooling layer , they are Max pooling and average pooling. Max pooling returns the maximum value from the portion covered by the kernel whereas the average pooling returns the average of all the values. Max pooling performs better than average pooling as max pooling performs de-noising and dimensional reduction whereas average pooling just reduces the dimension as a means to suppress noise.

FULLY CONNECTED LAYER:

Non linearity — ReLU

Adding a Fully-Connected layer is a (typically) low-cost technique of learning non-linear combinations of high-level features represented by the convolutional layer’s output. In these kind of area, the Fully-Connected layer is learning a possibly non-linear function.
We’ll flatten the image into a column vector now that we’ve turned it into a format suited for our Multi-Level Perceptron. Every round of training uses backpropagation to send the flattened output to a feed-forward neural network. The model can distinguish between dominating and certain low-level features in images over a number of epochs and classify them using the Softmax Classification approach.

--

--