Convolutional Neural Network Backbone

Colours, Convolution, Activation Functions and Normalization

Published in

Artificialis

12 min readAug 16, 2021

Our daily lives are made comfortable by technological advances that have smoothly integrated into our routines. Their use have become streamlined that there might be a tendency to take them for granted. Learners should appreciate the vast amount of thought and effort that has gone into developing the technology.

The convolutional neural network (CNN) is a beautiful work of art and science. We have discussed about the convolution process in the previous blog. We will now describe how convolution is used as part of a neural network. For this, we will follow some sections of the Fast.ai Fastbook Chapter 13.

Outline:

A. Set-up

B. Coloured images

C. Modelling

C.1. Data Loading

C.2. Construction of the CNN

C.2.a. Conv2d

C.2.b. Relu Activation

C.2.c. BatchNorm

C.3 Refinements

C.3.a. Fit_one_cycle

C.3.b. Batch size

C.4. Final model

Open your Notebook and learn with me how the backbone of Deep Learning is formed!

A. Set-up

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
#!pip install fastai -U
import fastai
from fastai.vision.all import *

If you need detailed instructions on setting-up, please refer to Step 1 a-b here.

B. Coloured images

B.1. Gathering Data

We will download coloured images from the web. It is important to remind yourself of your objectives for the study/ modelling, otherwise, it is very easy to get distracted and lose time and resources. For this study, we want to use images of 4 planets in illustrating the components of a coloured image and in building a simple CNN model.

!pip install jmd_imagescraper
from jmd_imagescraper.core import *
from pathlib import Pathroot = Path().cwd()/'planets'search = duckduckgo_searchsearch(root, 'earth','planet earth ', max_results = 750, img_layout=ImgLayout.All)
search(root, 'jupiter','planet jupiter ', max_results = 750, img_layout=ImgLayout.All)
search(root, 'saturn',' planet saturn ', max_results = 750, img_layout=ImgLayout.All)
search(root, 'neptune','planet neptune ', max_results = 750, img_layout=ImgLayout.All)

If you need an intro/ review on downloading images, please refer to Steps 1–4 here.

from jmd_imagescraper.imagecleaner import *
display_image_cleaner(root)

An iterative approach would help identify nuisances in the search. An initial small search (approx. max_results = 25), showed that ‘jupiter’ and ‘neptune’ also yielded art forms of Greek gods, watch devices, and cars. Revising the string search helped minimize the need for cleaning.

Once you have a big enough cleaned file (approx. >100 images for every class), it would be best to store the collection. This will also help identify the ‘path’.

# collect the files
zip_name = 'planets.zip'!rm -f {zip_name}
!zip -q -r {zip_name} {root}# download the files to your local computer
from google.colab import files
files.download(zip_name)# download the files to Google Drive
from google.colab import drive
import shutildestination_folder = 'planets'drive.mount('/content/drive/')
folder = Path('/content/drive/My Drive')/destination_folder
folder.mkdir(parents=True, exist_ok = True)shutil.copyfile(zip_name, str(folder/zip_name))# open the collected files
!unzip \*zip && rm *.zip

B.2. Looking inside the images.

B.2.a. Look inside the folder and subfolders.

path = Path('/content/planets/')(path/'jupiter').ls()

B.2.b. Identify one image to dissect.

Using one of the files identified in the .ls() code above, visualize one item.

from PIL import Image, ImageOps
jupiter = Image.open('/content/planets/jupiter/020_4ed4840b.jpg') #
show_image(jupiter);

Convert the image to tensor. (This is done for illustration purposes. During modelling, the conversion is facilitated by the DataLoaders. Refer to Step B.1 here if you need an intro/ review on tensors.)

jupiter_t = tensor(jupiter)
jupiter_t.shape

This image tensor has a shape of [298, 300, 3]. A single, seemingly 2-dimensional image is actually 3-dimensional (length x width x depth). The first two tensor dimensions refer to the pixel row and column (i.e. length and width). The 3rd dimension, depth, is usually made up of 3 channels, and refer to the red, green and blue (RGB) filters.

Each of these filters are comprised of tensors with values from 0 to 255. The value 0 signify an absence of the colour and 255 signify the brightest colour for a particular filter. The combination of the red, green and blue values in one pixel will render the colour. For example, red 100, green 100 and blue 0 will give a dark yellow pixel colour. A red 255, green 255 and blue 255 will give the colour white.

B.2.c. Colours and tensors of an area

Disclosure: I have not found the documentation that specifically assigns the channel index value to the colour filter. In this blog, it is not essential to label which filter is which. But, for ease of illustration, we will utilize the conventional order of RGB. We will thus call channel [0] as red, [1] as green, and [2] as blue.

Let us focus on the bright bluish spot on the north pole of the Jupiter image above.

# see disclosure above
red = pd.DataFrame(jupiter_t[:,:,0] )
red = red.iloc[25:35,110:120]green = pd.DataFrame(jupiter_t[:,:,1] )
green = green.iloc[25:35,110:120]blue = pd.DataFrame(jupiter_t[:,:,2] )
blue = blue.iloc[25:35,110:120]colour = [red, green, blue] 
for c in colour:
   c.style.set_properties().background_gradient('gist_heat')

Note: ‘gist_heat’ is used only to help visualize different tensor values. It does not reflect actual colour.

Looking at the tensor values and distribution between the three channels, we can appreciate that the blue colour is predominant, and that the contributing high values of the other channels render the spot whitish blue.

C. Modelling

C.1. Data Loading

#path = Path(‘/content/planets’)dblock = DataBlock(
 blocks = (ImageBlock(), CategoryBlock()),
 get_items = get_image_files,
 splitter = RandomSplitter(seed=42, valid_pct=0.2),
 get_y= parent_label,
 item_tfms = Resize(460),
 batch_tfms = (aug_transforms (size=200, max_rotate=20, max_zoom=1.2)))dblock.summary(path) # size train 2254, valid 563dls = dblock.dataloaders(path/’/content/planets/’ )

If you need a refresher on DataBlock, see Steps 6 a-f in Starting the Dive into Deep Learning.

dls.train.show_batch()

C.2. Construction of the CNN

C.2.a. Conv2d

C.2.a.i. Focusing on one Convolution step

nn.Conv2d(in_channels = 3, out_channels = 8, 
              kernel_size = 5,
              stride=2,
              padding = 5//2

nn.Conv2d is PyTorch method that facilitates the convolution process.
in_channels is the number of channels that the convolution step will be performed on. In this case, in_channels = 3 because of the 3 filters (RGB).
out_channels is the resulting number of output channels after the convolution process. This is set by the user based on his/her intuition and experience. For this case, we will use ‘8’.
kernel_size represents one of the dimensions of your square kernel. If the kernel is 5x5, the kernel_size is 5.
stride is the number of columns (or rows) by which the convolution moves.
padding is/are extra columns added to the edges of the input, such that the information along the edges are not lost. The values of these padding pixels are typically ‘0’.

C.2.a.ii. Focusing on the Following Convolution step

# 1st conv
(nn.Conv2d(in_channels = 3, out_channels = 8, 
              kernel_size = 5,
              stride=2,
               padding = 5//2)),#  output size: 100x100# 2nd conv
(nn.Conv2d(in_channels = 8, out_channels = 16,
          kernel_size = 5,
          stride=2,
          padding =5//2)), # output size: 50x50

The input and output has a size (input/output size) and a number (in/out_channels) .

Size

The shape will give an idea for the size. Our input shape was 200 x 200 x 3. The size 200 x 200 was determined when we resized the images on the DataBlock (see Step C. 1.).
It is useful to keep track of the input and output size. It is one of the ways of determining how deep the neural network is going. The output size can be computed as follows:

# step-by-step:
# 1.  input size = image length or height
# 2.  input size + padding on left edge + padding on right edge
# 3.  input size + (2 * padding) - kernel size
# 4.  (input size + (2 * padding) - kernel size) // stride
# 5.  ( (input size + (2 * padding) - kernel size) // stride) + 1# one-liner:
# output size = ( (input_size + (2 * padding) - kernel_size) // stride) + 1# compute for first output size in our case:
# output size = ( (200 + (2 * (5//2)) - 5) // 2 ) + 1
# output size = 100

Number

The input/ output number (quantity) refers to the in_channels and out_channels.
The following in_channels (e.g. ‘8’ in the 2nd conv) is based on the previous out_channels (e.g. ‘8’ in the 1st conv). This facilitates the connection of the layers in the network.

C.2.a.iii. Making a Series of Convolution steps

# input 200x200x3cnn = sequential(
    (nn.Conv2d(in_channels = 3, out_channels = 8, 
              kernel_size = 5,
              stride=2,
               padding = 5//2)),# output size: 100x100    (nn.Conv2d(in_channels = 8, out_channels = 16,
          kernel_size = 5,
          stride=2,
          padding =5//2)), # output size: 50x50
    
    (nn.Conv2d(in_channels = 16, out_channels = 32,
          kernel_size = 5,
          stride=2,
          padding=5//2)), # output size: 25x25    (nn.Conv2d(in_channels = 32, out_channels = 64,
          kernel_size = 5,
          stride=2,
          padding=5//2)), # output size: 13x13    (nn.Conv2d(in_channels = 64, out_channels = 128,
          kernel_size = 5,
          stride=2,
          padding=5//2)), # output size: 7x7    (nn.Conv2d(in_channels = 128, out_channels = 256,
          kernel_size = 3,
          stride=2,
          padding=3//2)), # output size: 4x4    (nn.Conv2d(in_channels = 256, out_channels = 512,
          kernel_size = 3,
          stride=2,
          padding=3//2)), # output size: 2x2    (nn.Conv2d(in_channels = 512, out_channels = 4, # dls.c
          kernel_size = 3,
          stride=2,
          padding =3//2)), # output size: 1x1
          
    (Flatten()) )

The PyTorch sequential container organizes the layers of the neural network. It connects the output of one layer to the input of the next layer.
The out_channels value is determined by the user’s intuition and adoption of previous successful works. In general, these are even numbers and doubles from one layer to the next. The last out_channels should correspond to the number of classes in your dataset. In this case, we have the 4 different planets.
The combination of the values for kernel_size, stride and padding is determined by both the user’s intuition as well as advanced approaches such as neural architecture searches. The pattern we have set out above is a modification to the CNN architecture on the Fastbook.
Notice the trend that using a stride = 2 effectively halves the size of the input for every layer.
Flatten is derived from the PyTorch squeeze, whereby any ‘1’ in the dimensions are removed.

C.2.a.iv. Applying Learner on CNN

learn = Learner(dls, cnn,
                loss_func = F.cross_entropy,
                metrics = accuracy,
                cbs = ActivationStats(with_hist=True))

For a brief refresher on cross_entropy and accuracy, refer to Step B.1. here.
cbs = ActivationStats will assist in evaluating values that approach ‘0’. For code details, refer to Convolution_planets in Github.

learn.fit(n_epoch = 1, lr = 0.003)

The cnn comprising of 8 convolution layers, processing 2,817 observations for 4 classes, performed worse than random guessing and will benefit from adjunct processes.

learn.lr_find()

The lr of 0.003 is good, and will be maintained.

C.2.b. Relu Activation

One of the ways that a neural network learns is by the use of Activation functions. Activation functions regulate the information flow by allowing only useful information to propagate.

The output from the previous layer is aggregated and undergoes screening by the activation function. If the screening is passed, the information will be forwarded as the input to the next layer.

* Image courtesy of https://cs231n.github.io/neural-networks-1/

One type of an activation function that is commonly used is the Rectified Linear Unit (ReLU). Only information with values 0 and above are relayed to the next layer.

cnn = sequential(
    (nn.Conv2d(in_channels = 3, out_channels = 8, # 
              kernel_size = 5, stride=2, padding = 5//2)),# 100
    (nn.ReLU()),    (nn.Conv2d(in_channels = 8, out_channels = 16,
          kernel_size = 5, stride=2, padding =5//2)), # 50
    (nn.ReLU()), 
    
    (nn.Conv2d(in_channels = 16, out_channels = 32,
          kernel_size = 5, stride=2, padding=5//2)), # 25
    (nn.ReLU()),    (nn.Conv2d(in_channels = 32, out_channels = 64,
          kernel_size = 5, stride=2, padding=5//2)), # 13
    (nn.ReLU()),    (nn.Conv2d(in_channels = 64, out_channels = 128,
          kernel_size = 5, stride=2, padding=5//2)), # 7
    (nn.ReLU()),    (nn.Conv2d(in_channels = 128, out_channels = 256,
          kernel_size = 3, stride=2, padding=3//2)), # 4
    (nn.ReLU()),    (nn.Conv2d(in_channels = 256, out_channels = 512,
          kernel_size = 3, stride=2, padding=3//2)), # 2
    (nn.ReLU()),    (nn.Conv2d(in_channels = 512, out_channels = 4,
          kernel_size = 3, stride=2, padding =3//2)), # 1
          
    (Flatten()) )learn.fit(n_epoch = 1, lr = 0.003)

Adding a ReLU activation to each layer (except the last conv) significantly increased the accuracy from 0.201 to 0.446. With this single adjustment, the model is already better than random guessing.

C.2.c. BatchNorm

The dynamic flow of information in a neural network is evident by the high connectivity of the units. This leads to a very high rate of change in input information. Batch Normalization provides standardization by centering the batch input values to a mean of 0 and a standard deviation of 1.

cnn = sequential(
    (nn.Conv2d(in_channels = 3, out_channels = 8, # 
              kernel_size = 5, stride=2, padding = 5//2)),# 100
    (nn.ReLU()),
    (nn.BatchNorm2d(8)),     (nn.Conv2d(in_channels = 8, out_channels = 16,
          kernel_size = 5, stride=2, padding =5//2)), # 50
    (nn.ReLU()),    
    (nn.BatchNorm2d(16)), 
    
    (nn.Conv2d(in_channels = 16, out_channels = 32,
          kernel_size = 5, stride=2, padding=5//2)), # 25
    (nn.ReLU()),
    (nn.BatchNorm2d(32)),     (nn.Conv2d(in_channels = 32, out_channels = 64,
          kernel_size = 5, stride=2, padding=5//2)), # 13
    (nn.ReLU()),
    (nn.BatchNorm2d(64)),     (nn.Conv2d(in_channels = 64, out_channels = 128,
          kernel_size = 5, stride=2, padding=5//2)), # 7  
    (nn.ReLU()),    
    (nn.BatchNorm2d(128)),     (nn.Conv2d(in_channels = 128, out_channels = 256,
          kernel_size = 3, stride=2, padding=3//2)), # 4
    (nn.ReLU()),
    (nn.BatchNorm2d(256)),     (nn.Conv2d(in_channels = 256, out_channels = 512,
          kernel_size = 3, stride=2, padding=3//2)), # 2
    (nn.ReLU()),
    (nn.BatchNorm2d(512)),     (nn.Conv2d(in_channels = 512, out_channels = 4,
          kernel_size = 3,
          stride=2,
          padding =3//2)), # 1
          
    (Flatten()) )

We will consider a set of conv-relu-batchnorm as a single layer. The BatchNorm2d parameter should be the same as the conv2d out_channels for the same layer.

learn.fit(n_epoch = 1, lr = 0.003)

Batch normalization further improved the accuracy from 0.446 to 0.576.

C.3. Refinements

C.3.a. Fit_one_cycle

learn.fit_one_cycle(n_epoch = 1, lr_max = 0.003)

Using the fit_one_cycle improved the accuracy from 0.576 to 0.607.

While the fit() function is restricted to one learning rate, the fit_one_cycle() enables the learner to take advantage of momentum. The aim of learning is to reach the point of ‘minimal loss’, where the probability of making a correct prediction is high, and the probability of making a wrong prediction is low.

learn.recorder.plot_sched()

* x-axis: number of batch (2400 train observations / 64 per batch = 37 batches)

The one_cycle method enables the model to use a range of learning rates. It starts with a small learning rate with a high momentum. As gradient optimization work towards the minimum, the learning rate increases and the momentum decreases. With this inverse learning rate- momentum relationship, once the area of minimal loss is reached, the learning rate is high, but the momentum is low. This approach enables the model to reach the minimal loss (in essence, high accuracy) in a relatively fast and precise manner.

C.3.b. Increase Batch size

dls_128 = dblock.dataloaders(path/'/content/planets/', bs = 128)learn = Learner(dls_128, cnn,
                loss_func = F.cross_entropy,
                metrics = accuracy)
learn.fit_one_cycle(n_epoch = 1, lr_max = 0.003)

Increasing the batch size from the default of 64 to 128, slightly increased the accuracy of the model from 0.607 to 0.632.

C.4. Final model

We will combine the features of

CNN sequential (layers: conv — relu — batchnorm)
fit_one_cycle
higher batch size

to come up with a model that could give a reasonable accuracy.

dls_128 = dblock.dataloaders(path/'/content/planets', bs = 128)learn = Learner(dls_128, cnn,
                loss_func = F.cross_entropy,
                metrics = accuracy)
learn.fit_one_cycle(n_epoch = 5, lr_max = 0.003)

The combination of the above features being run for 5 epochs increased the model’s accuracy from 0.632 to 0.698. The present structure of the model will be able to correctly identify two out of three planet images.

Summary:

We were able to create a modest convolutional neural network backbone that is able to reasonably predict classes in a newly generated dataset comprising of 2,817 coloured images of 4 planets.

I hope you enjoyed making your own basic CNN! :)

Maria Rodriguez

LinkedIn: https://www.linkedin.com/in/rodriguez-maria/

Github: https://github.com/yrodriguezmd?tab=repositories

Twitter: https://twitter.com/Maria_Rod_Data

Convolutional Neural Network Backbone

Colours, Convolution, Activation Functions and Normalization

Written by Maria L Rodriguez