CNN’s Building Blocks

7 min readJun 1, 2021

In our era, we’re trying to train machines to make them do what we’re doing in daily basis. Not with the perfect rate but we are trying to develop machines to help us, makes us better, and faster at some important tasks. Computer vision is one of the most important task we’re working on. Seeing is a gift for human race. We can see, interpret, and analyze everything around us. When we look at a picture with dog chasing a ball eagerly, we can say just like I wrote. We can perceive the emotions in the picture, what is happening at that exact moment, or what can happen just after the shot. Computer vision task is trying to be developed for this purpose. Because sometimes, human vision may be inadequate or delayed. Sometimes instant responses may be required. Like, tumor diagnoses, intervention at the time of crime.

Computer vision is the science that enables computers to analyze and elaborate videos, and images like human brain does. Deep Neural Networks’ class Convolutional Neural Networks (CNN or ConvNet) are widely used for Computer Vision algorithms.

What is Convolutional Neural Network?

A CNN is a neural network: an algorithm used to recognize patterns in data. CNN is a specialized type of DNN (deep neural network) model designed for working with two or more dimensional image data. CNN takes its name from the ‘convolutional’ layer. This layer performs an operation called ‘convolution’.

🥁 Now, let’s break it down to every layer of CNN with detailed.

1. Convolution Layer

A convolutional layer involves the multiplication of a set of weights with the input, like traditional neural network but for CNNs, we have two dimensional input. The multiplication is performed between an array of input data and a two dimensional array of weights, called a filter or a kernel.

This filter or kernel is always smaller than input, and moves all over the image matrix. It multiply its values by the original pixel values, and all these multiplications are summed up to one number at the end. This filter moves to the right and down in n (can vary) steps. Result matrix is (should be) smaller than the input matrix.

The output array that occurs after this operation between input and filter is called ‘feature (or activation) map’.

It’s hyper parameters include the filter size F, and stride (step size) S.

Color images consist of three channels (red-green-blue, or RGB). Therefore, the convolution process is done for three channels. The number of channels of the feature map will also be equal to the number of channels of the filter.

After every convolution layer, there is a non-linear layer where non-linear activation functions such as ReLu are applied to each value in the feature map to add non-linearity to the data. Input image and filter are the matrix of weights updated by backpropagation. The bias (b) value is added to the output matrix resulting from the non-linear layer.

By increasing the nonlinearity, a complex network is created to find new patterns in the images.

Understanding Hyperparameters

Padding — after convolution process, we can control the size difference between the input and output matrix. Symmetrically adding zeros (zero padding) on the edges of the input matrix is the most used padding method because of its performance, simplicity, and computational efficiency (AlexNet).

Output size after padding — (W−F+2P)/S+1, for input volume size (W), filter size (F), the stride (S), and the amount of zero padding used on the border (P). For example, we have 7x7 input shape, 3x3 filter shape with stride 1 and padding 0. (7–3+2x0)/1+1 = 5. Output shape would be 5x5.
Setting zero padding to be P = (F - 1)/2 when the stride is 1 ensures that the input volume and output volume will have the same size.

2. Kernel Size — refers to the dimensions of the sliding filter over the input. Small size filters can extract a larger amount of information from the input matrix. It performs better as it will cause a smaller reduction in layer dimensions.

3. Stride — indicates how many pixels the kernel should be shifted over at a time. Let stride be 1, and filter will be shifted to the right by one pixel for every operation. Smaller the stride is more data is extracted but it leads to larger output.

🧩 Activation Functions

Activation functions are one of the crucial steps of deep learning designs. Activation function in the hidden layer will control how well the model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.

An activation function in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network.

📌 Different activation functions may be used in different parts of the model. Hidden layers usually use the same activation function when the output layer will usually use a different activation function, and dependent upon the type of prediction type of dataset.

Four activation functions that you probably will see the most are:

Rectified Linear Activation (ReLU)
Logistic (Sigmoid)
Hyperbolic Tangent (Tanh)
Softmax

2. Pooling Layer

Its functions is used to reduce the spatial size of the representation to reduce the amount of parameters, and computation in the network, and also control overfitting.

📌 This layer apply learned filters to input images in order to create feature maps the summarize the presence of those features in the input.

📌 The size of the pooling filter must be smaller than the size of the feature map.

There are two main types of pooling layers. They are max pooling, and average pooling.

Max Pooling — Calculate the maximum value for each patch of the feature map.

Average Pooling — Calculate the average value for each patch on the feature map.

3. Fully-Connected Layer

Neurons in a fully connected layer have full connections to all activations in the previous layer, and seen in regular neural networks. The input of the fully-connected layer is the output from the final layer, pooling or convolutional layer, which is flattened, and then fed into the fully connected layer.

But, wait! What is flattened? Let me explain.

This layer converts three-dimensional layer into a one-dimensional vector to fit the input of a fully-connected layer. For example, 6x6x3 tensor would be converted into a size 108 vector after flatten layer.

After passing through the fully connected layers, the final layer uses the softmax activation function (instead of ReLU) which is used to get probabilities of the input being in a particular class (classification).

In theory it kinda looks easy, but code part can be hard. So, the key is: code, code, code. I think that was Daniel Bourke’s favorite line.

Well, it’s done for now 💫 You can read my other articles on Medium!

The Less The Loss, The Better… But How?

You can find this articles notebook on my Github.

medium.com

Make Your Data Look Good 🌈

Often, the data we have is not enough to make sense with a single glance. By saying enough, I didn’t mean less. We can…

medium.com

Do You Really Know Naive Bayes?

Our next machine learning algorithm is Naive Bayes Classifier. Like my other articles about machine learning…

medium.com

Is Your Model Reliable?

When you build a model, your intentions for it to be a good fit on your data. You should have good accuracy to ensure…

medium.com

Your Guide for Logistic Regression with Titanic Dataset

Next machine learning algorithm we’ll be talking about is logistic regression (also called Sigmoid Function). As I…

medium.com

Linear Regression

Our goal is to find a relationship between variables in machine learning. We have many algorithms to use for every use…

medium.com

Sources

CNN Explainer

An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs).

poloclub.github.io

CS231n Convolutional Neural Networks for Visual Recognition

Table of Contents: Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous…

cs231n.github.io

How to Choose an Activation Function for Deep Learning - Machine Learning Mastery

Activation functions are a critical part of the design of a neural network. The choice of activation function in the…

machinelearningmastery.com

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks - Machine Learning…

Convolutional layers in a convolutional neural network summarize the presence of features in an input image. A problem…

machinelearningmastery.com

Convolutional Neural Network

In this article, we will see what are Convolutional Neural Networks, ConvNets in short. ConvNets are the superheroes…

towardsdatascience.com

CNN’s Building Blocks

What is Convolutional Neural Network?

1. Convolution Layer

Understanding Hyperparameters

🧩 Activation Functions

2. Pooling Layer

3. Fully-Connected Layer

The Less The Loss, The Better… But How?

You can find this articles notebook on my Github.

Make Your Data Look Good 🌈

Often, the data we have is not enough to make sense with a single glance. By saying enough, I didn’t mean less. We can…

Do You Really Know Naive Bayes?

Our next machine learning algorithm is Naive Bayes Classifier. Like my other articles about machine learning…

Is Your Model Reliable?

When you build a model, your intentions for it to be a good fit on your data. You should have good accuracy to ensure…

Your Guide for Logistic Regression with Titanic Dataset

Next machine learning algorithm we’ll be talking about is logistic regression (also called Sigmoid Function). As I…

Linear Regression

Our goal is to find a relationship between variables in machine learning. We have many algorithms to use for every use…

Sources

CNN Explainer

An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs).

CS231n Convolutional Neural Networks for Visual Recognition

Table of Contents: Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous…

How to Choose an Activation Function for Deep Learning - Machine Learning Mastery

Activation functions are a critical part of the design of a neural network. The choice of activation function in the…

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks - Machine Learning…

Convolutional layers in a convolutional neural network summarize the presence of features in an input image. A problem…

Convolutional Neural Network

In this article, we will see what are Convolutional Neural Networks, ConvNets in short. ConvNets are the superheroes…

Written by Güldeniz Bektaş