An Overview on Convolutional Neural Networks

Ashley <3
The Startup
Published in
5 min readNov 23, 2020

Convolutional Neural Networks, or otherwise known as CNNs, are deep learning models that are most commonly used for computer vision applications (such as classifying images), and in some cases used for natural language processing tasks (such as text classification).

A specialty of CNN’s is that they are able to efficiently recognize patterns that occur in the input image, including lines, shapes, gradients, eyes, or even faces. As a result of CNN’s ability to work so well at recognizing details, the model is primarily used in computer vision applications. Unlike many other previous computer vision models, CNN’s can work with a raw image, not needing any preprocessing (meaning alteration of the data to suit the model).

What makes a CNN unique from other neural networks?

A regular neural network has an input layer, hidden layers, and an output layer. The input layer essentially accepts a vast variety of different inputs, while the hidden layers perform calculations based on the inputs, and finally, the output layer will deliver the outcome of the calculations. A regular neural network contains neurons that are connected to neurons in the previous layer, each neuron having its own specific weight. This means there are no assumptions about the data being inputted into the network… which seems great, but not for tasks related to images and language.

This is where CNNs come in to do the job.

CNNs work differently as a result of the model treating data as spatial. Rather than neurons being connected to the neurons in previous layers, the neurons within a CNN are only connected to neurons close to it, and all have the same weight.

Why is it important that data is spatial?

Let’s say an image of a dog was inputted into the neural network, and the network was analyzing the eye of the dog. If the data was not spatial, the neural network would think an eye is all over the image, like this:

A result of non-spatial data.

The Structure

A CNN is a feed-forward neural network, and its layers can range from around 20 to 30. Like other neural networks, CNNs utilize a ReLu activation and a fully connected layer. Essentially the ReLu layer ensures non-linearity as the data passes through each layer, and without it, the data would lose the dimensionality needed to be maintained. The fully connected layer will then perform classification on your dataset.

The Process of a CNN

Most noticeably, what makes a CNN different from other neural-networks is its special layer called the convolutional layer, and the pooling layer. In fact, the convolutional layers are the major building blocks used within CNNs, and are the most important components of this network.

So what is a convolutional layer?

The convolutional layer is the first layer to extract features from the input image and works by putting a filter over an array of pixels. This layer will preserve the correlation between the pixels by learning image features, by using small squares of input data. This is essentially a mathematical operation that takes two inputs including the image matrix and the filter, leading to the creation of a feature map.

Essentially a feature map is the result of one filter which has been applied to the previous layer. The given filter is drawn across the previous layer, moving one pixel at a time. Each position will result in a neuron activation, and the output of the activation is collected in the feature map.

And what is a pooling layer?

A pooling layer reduces the sample size of a feature map and additionally makes processing faster because it reduces the total number of parameters a network needs to process. The output of a pooling layer is a pooled feature map.

Two ways to create a pooled feature map:

Max pooling

OR

Average Pooling

The difference between the two:

  • Average pooling differs from max-pooling because it retains information that is less important in a pooled feature map.
  • Max pooling throws away these less important features, by picking the maximum value of the pooled feature map.
Max Pooling and Average Pooling

The aftermath of this process will lead to feature abstraction, which essentially allows the network to create a picture of the image data, on the basis of its own mathematical rules.

Finally, CNNs have a connected layer, which performs classification. The connected layer must be flattened because a neural network with many sets of connections can only process linear data.

The connected layer must be flattened.

CNN Training Methods

Unsupervised methods

If you’re working with unlabeled data, you can use an auto-encoder which is an unsupervised neural network. This network accepts input data, will encode the data, then compress the data into a latent-space representation, and finally decode the data.

compressed input → decompressed = distorted version of the original input

Using GANs

A CNN can additionally be trained through the use of Generative Adversal Networks, otherwise known as GANs. Using the GAN method, you would train two networks. The first network gives artificial data samples, which closely resemble data within the training set. The second network is called the discriminative network, and its job is to distinguish between the provided artificial and true data.

Contact me for any inquiries 🚀

Hi, I’m Ashley, a 16-year-old coding nerd and A.I. enthusiast!

I hope you enjoyed reading my article, and if you did, feel free to check out some of my other pieces on Medium :)

If you have any questions, would like to learn more about me, or want resources for anything A.I. or programming related, you can contact me by:

💫Email: ashleycinquires@gmail.com

--

--

Ashley <3
The Startup

computer scientist, dog lover, peanut butter enthusiast, and probably a little too ambitious