Convolutional Neural Networks (CNNs) Tutorial with Python
An in-depth tutorial on convolutional neural networks (CNNs) with Python
Last updated, January 8, 2021
Author(s): Saniya Parveez, Roberto Iriondo
Join us ↓
Towards AI is a community that discusses artificial intelligence, data science, data visualization, deep…
Table of Contents
- Network Architecture
- Convolutional Layers
- Pooling Layers/Subsampling layers
- Fully Connected Layer
- Non-Linear Layers
- Python Implementation of Convolutional Neural Networks (CNNs)
- Hyperparameters for CNNs
- Regularization Methods in CNNs
📚 Check out our editorial recommendations on the best machine learning books. 📚
Yann LeCun and Yoshua Bengio introduced convolutional neural networks in 1995 , also known as convolutional networks or CNNs. A CNN is a particular kind of multi-layer neural network  to process data with an apparent, grid-like topology. The base of its network bases on a mathematical operation called convolution. Fundamentally, machine learning algorithms use matrix multiplication, but in contrast, CNNs use convolutions in place of matrix multiplications at least in one layer — a convolution is a specialized kind of linear operation.
Convolutional neural networks (CNNs) are undoubtedly the most popular deep learning architecture. Their applications are everywhere, including image and video recognition, image analysis, recommendation systems, natural language processing, computing interfaces, financial time-series, and several others .
Biological findings inspire the development of the neural network with the following standard capabilities:
Input → Weights → Logic function → Output
Essential facts about CNNs:
- CNNs are neurobiologically-driven by the findings of locally sensitive and orientation-selective nerve cells in the visual cortex.
- They are a multi-layer neural network.
- They implicitly extract relevant features.
- They are a feed-forward network that can extract topological features from images.
- They recognize visual patterns directly from pixel images with minimal preprocessing.
- They are astonishingly powerful because they can easily recognize patterns that have extreme variability. e.g., hand-writing.
- CNNs are trained with a version of the backpropagation algorithm.
- CNNs have the neuronal cells in the visual cortex, making the base behind CNNs and watches for particular features.
Why are CNNs Required?
CNNs have several advantages for image recognization and other applications like:
- Detection using CNN is robust to distortions like change in shape due to camera lens, different lighting conditions, different poses, the presence of partial occlusions, horizontal and vertical shifts, and others.
- It requires less memory for processing and execution.
- It is straightforward and suitable for training. By using CNNs, we can dramatically reduce the number of parameters. Therefore, the training time is also proportionately reduced.
Types of Convolutional Neural Networks (CNNs)
These are some of the different types of CNNs :
- 1D CNN → In this case, the Kernal moves in one direction. The input and output data of a 1D CNN is two-dimensional. 1D CNNs are mostly used on time-series.
- 2D CNN → Under a 2D CNN, the kernel moves in two directions. The input and output data of 2D CNN is three-dimensional. We usually use this on image data problems.
- 3D CNN → Here, the kernel moves in three directions. The input and output data of a 3D CNN is four-dimensional. Engineers use 3D CNNs on 3D images like DICOM images of MRIs, CT Scans, and other complex applications.
A CNN architecture is developed by a stack of different layers that convert the input volume into an output volume through a differentiable function. A few different types of layers are commonly used.
Below is the stack of different layers in CNNs:
- Convolutional layers
- Pooling layer
- Fully connected layer
In summary, the example of complete layers of CNNs:
The complete architecture of CNNs:
Image processing is a process to perform operations on an image to get an enhanced image or extract some critical information from it. There are three different ways to perform image processing:
- Histogram processing.
- Transformation function.
A convolution is a mathematical calculation on two functions named f and g that gives a third function (f * g). This third function reveals how the shape of one is modified by the other. Its mathematical equation is as follows:
It is essential to understand the concept of a mask or filter before the concept of convolution.
Mask or Filter
A mask is a small matrix whose values are called weight. A two-dimensional matrix represents it. It is also known as filtering. Its interesting point is that it should be in odd numbers. Otherwise, it is difficult to find the mid of the mask.
Below code example of a mask from an array:
import numpy as np
import numpy.ma as maoriginal_array = np.array([1, 2, 3, -1, 5])original_array
Create a mask of the original array:
masked = ma.masked_array(original_array, mask=[0, 0, 0, 1, 0])masked
Why are Convolutions Important in CNNs?
The convolution cycle in CNNs is crucial because it can manipulate images in the following cases:
- Edge detection
- Noise reduction
How is a Convolution Performed?
These are the steps to perform a convolution:
- Flip the mask horizontally and vertically only once.
- Slide the mask onto the image.
- Multiply the analogous elements, following by adding them.
- Repeat all the above steps until all values of an image have been calculated .
Following the steps above:
Flip → Horizontally
Flip → Vertically
Let’s take the dimension of an image like below:
Now, to calculate the convolution follow the steps below:
- Place the core of the mask at each component of an image.
- Multiply the analogous elements and add them
- Finally, paste the result onto the image's element on which the mask's center is placed.
From figure 14:
- The green box is the mask and green values in the box is the value of the mask
- The blue box and its value is related to the image
Now, calculate the first pixel of the image ↓
px1 = (5 * 2) + (4 *4) + (1* 0)
px1 = 10+ 16+16+10
px1 = 52
The result of the 1st pixel of the image is 52. Therefore, based on the result, we follow the following steps:
- Place the value 52 in the original image at the first index.
- Repeat this step for each pixel of the image.
A CNN is a neural network with some convolutional layers and some other layers. A convolutional layer has several filters that do the convolutional operation. Convolutional layers are applied to bidimensional inputs and are very famous due to their fantastic image classification job performance. They are based on the discrete convolution of a small kernel k with a bidimensional input, and this input can be the output of another convolutional layer. The convolutional layer is the core building block of a CNN .
Convolution shares the same parameters across all spatial locations; however, traditional matrix multiplication does not share any parameters.
Building a convolution layer in Keras:
from keras.models import Sequential
from keras.layers.convolutional import Conv2Dmodel = Sequential()model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu'))
Explanation from the code implementation above:
- The output will have 32 feature maps.
- The kernel size is going to be 3x3.
- The input shape is 32x32 with three channels.
- padding = same. It means the same dimensional output is required as input.
- Activation specifies the activation function.
Next, build a convolutional layer with different parameter values as below:
model.add(Conv2D(32, (3, 3), activation='relu', padding='valid')
So, from the above code of convolutional layer:
- Kernel =3X3
- padding=valid: This means that the output dimension can take any form .
Pooling Layers/Sub Sampling Layer
Fundamentally, the pooling layer is used to reduce the dimensionality of the image. It is also used for detecting edges, eyes, nose, corners, and others in the image using multiple filters. Its function is to reduce the number of parameters and also reduce the spatial size in the network. There are two ways in which we can achieve pooling:
- Max Pooling: It states the maximum output within a rectangular neighborhood.
- Average Pooling: It states the average output of a rectangular neighborhood.
The most used pooling is max-pooling and average pooling. Spatial size is reduced for images because it gives fewer pixels and fewer features or parameters for further computations.
Hence, pooling layers serve two significant purposes:
- Continuous reduction of the feature map's spatial size as the network moves from one convolution layer to the next, thus reducing the number of parameters.
- Progressively identifying essential features while discarding the card (this is true more in the max-pooling than average pooling).
The above picture shows a MaxPool with a 2X2 filter with stride 2.
Below depiction of max pooling and average pooling:
Implement Max Pool layer in Keras as below:
Here, Kernel size = 2 x 2
Subsampling pixels will not change the object, so pooling can subsample the pixels to make the image smaller.
It is a component in the neural network, which mainly modifies the movement of videos and images. Stride is a parameter that works in conjunction with padding. For example, If a stride is set to 1, we move one pixel or unit at a time. Similarly, if the stride is set to 2, we move 2 units pixels or units.
Essentially, the stride is the number of pixels a convolutional filter transits, like a sliding window, after moving on the weighted average value of all the pixels it just covered. The old weighted average value becomes one pixel in the feature map in the next layer. The next weighted average proceeds from a new collection of pixels, and it forms the next pixel in the feature map in the subsequent layer.
Below, please find an animated presentation of a stride:
The stride of 1:
The Stride of 2:
The animation of stride in figure 22 simply explains that:
Stride in a convolutional neural network dilutes how many steps can be skipped while scanning features horizontally and vertically on the image.
In CNNs, striding goes from one network layer to another layer. Therefore there are two choices to either decrease the data size or keep it to the same size. So, both the padding and stride impacts the data size. Padding is essential in stride because, without padding, the next layer will reduce the data size.
When a stride is used, it starts with the filer in the top left corner and calculates the value of the first node, and when it moves the node by two units, it goes on when the filter extends outside the image, creating a space. Thus, padding is used to fill the void created by striding.
Let’s take an input layer of 5X5 with kernel 3X3 as below:
Apply Stride of 1:
Apply Stride of 2:
Suppose we apply a stride of 3 while still looking at the 5x5 input — what would happen?
Consequently, padding is required here. For the entire input, the padding data is added with a width equal to the kernel width minus one or height equal to kernel height minus one if it is above and beneath so that the kernel can look at the extreme edges as shown in figure 27:
Hence, from the above pictorial representation:
Having no padding means that the data size will get decreased for the next layer. At the same time, the introduction of sufficient padding will retain the size intact. Furthermore, it limits the overlap of two subsequent dot products in the convolution operation with more strides. It means that every output value in the activation will be more independent of the neighboring values.
Fully Connected Layer
This layer is the summation of all the input and weights which determine the final prediction — representing the output of the last pooling layer. Fully connected, as the name states, makes every node in the first layer connected to the nodes in the second layer. Performing classification based on the features extracted by the previous layers . It connects every neuron in one layer to every neuron in another layer.
CNNs can be broken down into two categories:
- Feature extraction
The fully connected layer’s main responsibility is to do classification. It is used with a softmax or sigmoid activation unit for the result.
The activation function applied to the last layer is very different from the others. The activation used for multiclass is the softmax function that normalizes the fully connected layer with probabilities of 0 and 1, which sum up to 1.
Typically Softmax is used only for the output layer, for neural networks that need to classify inputs into multiple categories. Neural networks in common and CNNs, in particular, rely on a non-linear “trigger” function to signal definite identification of possible features on each hidden layer.
To efficiently implement this non-linear layer, CNNs use the below functions:
- ReLUs (Rectified Linear Units)
- Continuous Trigger function
Keras code as below with non-linear function “Relu”:
Here, 512 hidden units.
Keras code as below with non-linear function “Softmax”:
Python Implementation of Convolutional Neural Networks (CNNs)
Keras CNNs layers code implementation for the CNNs:
Import all required libraries
import numpy as npimport pandas as pdfrom keras.optimizers import SGDfrom keras.datasets import cifar10from keras.models import Sequentialfrom keras.utils import np_utils as utilsfrom keras.layers import Dropout, Dense, Flattenfrom keras.layers.convolutional import Conv2D, MaxPooling2D
Load Cifar01 data:
(X, y), (X_test, y_test) = cifar10.load_data()
Display test dataset
Normalize the data:
X, X_test = X.astype('float32')/255.0, X_test.astype('float32')/255.0
Convert to categorical:
y, y_test = utils.to_categorical(y, 10), u.to_categorical(y_test, 10)
Initialize the model:
model = Sequential()
Add Convolutional Layer with below parameters:
- Features map = 32
- Kernel size = 3x3
- Input shape = 32x32
- Channels = 3
- Padding = 3 → It means the same dimension output as input.
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu'))
Add the dropout rate:
Add another CNN layer with padding = valid.
padding = valid → It means output dimension can take any form.
model.add(Conv2D(32, (3, 3), activation='relu', padding='valid'))
Add a Max Pooling layer.
Flatten the data:
In CNNs, it is important to flatten the data before the input it into the output or dense layer.
Add dense layer:
Here, the number of hidden units is 521.
Add the output dense layer:
Compile the model:
model.compile(loss='categorical_crossentropy', optimizer=SGD(momentum=0.5, decay=0.0004), metrics=['accuracy'])
Fit the algorithm with 25 epochs:
model.fit(X, y, validation_data=(X_test, y_test), epochs=25, batch_size=512)
print("Accuracy: &2.f%%" %(model.evaluate(X_test, y_test)*100))
Hyperparameters for CNNs
Hyperparameter is very important to control the learning process. It is applied before the training that manages the network structures like the number of hidden units. The following should be kept in intelligence when optimizing:
Max Pooling Shape
In max pooling, the maximum value is selected within a matrix. The size of the matrix could be 2x2 or 3x3. Typical values are 2x2. Huge input volumes may warrant 4x4 pooling in the lower layers. So, choosing larger shapes will dramatically reduce the signal's dimension and may result in excess information loss.
It is crucial to find the right level of granularity in a given dataset without overfitting.
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
Number of Filters
The number of filters should be selected carefully because the number of feature maps directly controls the capacity and depends on the number of available examples and task complexities .
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
Regularization Methods in CNNs
Regularization is a method of including extra information to solve an irregular problem or to stop overfitting. CNN also uses regularization to handle all those problems. Below are different types of regularization techniques used by CNNs:
Different categories of empirical regularization:
- Stochastic pooling
Code implementation of dropout in the layer:
Different categories of explicit regularization:
- Early stopping
- Weight decay
- Number of parameters
- Max norm constraints
Overfitting is a common problem in machine learning and deep learning. There are several ways to avoid such kinds of problems, and early stopping is one of them. It stops the process early.
Code snippet implementation:
from keras.callbacks import EarlyStoppingearlystop = EarlyStopping(monitor = 'val_loss', min_delta = 0, patience = 3, verbose = 1, restore_best_weights = True)
Explanation from the above code:
- monitor: Monitors the value. i.e., val_loss
- min_delta: It is the monitored value. For example, if min_delta = 1, then it means that the training process will be stopped if the absolute change of the monitored value is less than 1 .
- patience: If there is no improvement after a certain number of epochs, training will be stopped.
- restore_best_weights: If its value is set to true, then it keeps the best weighs once stopped.
Convolutional neural networks are a special kind of multi-layer neural network, mainly designed to extract the features. They recognize visual patterns directly from pixel images with very minimal processing.
CNNs use two operations called convolution and pooling to reduce an image into its essential features and uses those features to understand and classify the image appropriately .
Another benefit of CNNs is that they are easier to train and have fewer parameters than fully connected networks with the same number of hidden units .
Convolutional neural networks (CNNs) are used in various fields such as healthcare to diagnose diseases like pneumonia, diabetes, and breast cancer, self-driving cars, surveillance monitoring, and others .
DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of Carnegie Mellon University nor other companies (directly or indirectly) associated with the author(s). These writings do not intend to be final products, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.
All images are from the author(s) unless stated otherwise.
Published via Towards AI
 Convolutional Networks for Images, Speech, and Time-Series, Yann Lecun, Yoshua Bengio, https://www.researchgate.net/profile/Yann_Lecun/publication/2453996_Convolutional_Networks_for_Images_Speech_and_Time-Series/links/0deec519dfa2325502000000.pdf
 Classification of Body Constitution Based on TCM Philosophy and Deep Learning, Yung-Hui Li, Muhammad Saqlain Aslam *, Kai-Lin Yang, Chung-An Kao, and Shin-You Teng, Symmetry, https://doi.org/10.3390/sym12050803
 Convolutional Neural Network, Wikipedia, https://en.wikipedia.org/wiki/Convolutional_neural_network
 Main Types of Neural Networks and Its Applications — Tutorial, Pratik Shukla, Roberto Iriondo, https://towardsai.net/p/machine-learning/main-types-of-neural-networks-and-its-applications-tutorial-734480d7ec8e
 Breaking it down: A Q&A on machine learning, Google, https://www.google.com/about/main/machine-learning-qa/
 2D CNN in TensorFlow 2.0 on CIFAR-10 — Object Recognition in Images, KGP Talkie, https://kgptalkie.com/2d-cnn-in-tensorflow-2-0-on-cifar-10-object-recognition-in-images/
 Business Applications of Convolutional Neural Networks, The App Solutions, https://theappsolutions.com/blog/development/convolutional-neural-networks/
 Concept of Convolution, TutorialsPoint, https://www.tutorialspoint.com/dip/concept_of_convolution.htm
 Convolutional Neural Network, Wikipedia, https://en.wikipedia.org/wiki/Convolutional_neural_network
 Keras Convolutional Neural Network with Python, Sagar Jaiswal, Github, https://github.com/sagar448/Keras-Convolutional-Neural-Network-Python
 Nepali Handwritten Character Recognition using CNN, AI DEV Nepal, https://www.aidevnepal.co/nepali-handwritten-character-recognition-using-cnn/
 Keras Callbacks Explained in Three Minutes, Andre Duong, KDnuggets, https://www.kdnuggets.com/2019/08/keras-callbacks-explained-three-minutes.html
 QingZeng Song, Lei Zhao, XingKe Luo, XueChen Dou, “Using Deep Learning for Classification of Lung Nodules on Computed Tomography Images”, Journal of Healthcare Engineering, vol. 2017, Article ID 8314740, 7 pages, 2017. https://doi.org/10.1155/2017/8314740