Deep Learning — Convolutional Neural Networks (CNN)

Part III

The Convolutional Neural Networks (ConvNets) are used for natural language processing and computer vision. The key difference between Multilayer Perceptrons (MLP) and ConvNet is that MLP has densely connected layers that learn global patterns whereas ConvNet learns local patterns by using filters.

ConvNet success in image classification contest in 2011 led to the broader attention to the field of deep learning.

ConvNets are designed to work with grid structured inputs, which have strong special dependencies in the local region of the grid. For example, the simplest grid structure is a black and white image that is a 2 dimensional grid structure.

Two interesting characteristics that ConvNet has are:

  1. The patterns ConvNet learns are translation invariant.
  2. ConvNet can learn special hierarchies of patterns.
Source: https://peltarion.com/article/classifying-images-of-clothes-with-the-peltarion-platform

In ConvNet the states in each layer are arranged according to a special grid structure. These special relationships are inherited from one layer to the next one. It is important to maintain these special relationships among the grid cells because the convolutional operation and the transformation to the next layer is critically dependent on these relationships.

A typical ConvNet has three types of layers and those are: convolution, pooling and ReLu (as shown on the image above).

In our example we are using CIFRA10 dataset. CIFRA10 dataset was developed by the Canadian Institute for Advanced Research and consists of 60,000 photos size 32x32 pixels, divided into 10 classes. The dataset is split in training and test data, where 50,000 images are training data and 10,000 images are test data.

Examples from CIFRA10 dataset

So, the special dimensions of the input data is 32x32 with a depth of 3, which needs to be supported by our first layer of ConvNet.

In the ConvNet the parameters are organized into sets of 3 dimensional structural units called filters. The filters are usually square in terms of special dimensions and the typical sizes are 3x3, 5x5, 7x7 and more on the larger images, and they are also called feature map.

The fundamental difference between a densely connected layer (our Part I example) and a convolutional layer is that the former learns global patterns in its input feature space, whereas a convolutional layer learns local patters based on the size of filters.

Padding

Convolution operation reduces the size of the layer in comparison with the size of the previous layer. Let’s look at the example. If we have input an image 32x32, after the first layer of convolution operation you will get a 28x28 image. If you need to get an output image with the same spatial dimensions as the input image, then we will need to use padding. Padding is a technique of adding an appropriate number of rows and columns on each side of the input feature map to make it possible to fit a center convolution window around every input tile. With Keras library this is achieved by configuring padding argument to be “same” in Conv2D layer, which means that output should have the same width and height as the input. Take a look at the example code:

# Convolutional Layer 1
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3),
padding='same', activation='relu',
kernel_constraint=maxnorm(3)))

The Max-Pooling Operation

In the example below you will noticed that the size of the feature map is reduced by 50% after every MaxPooling2D layer, which has the role of aggressive downsampling of the feature map. This operation is conceptually similar to convolution operation except that instead of transforming local patches via a learned linear transformation, it transforms via hardcoded max tensor operation. Max pooling is usually done with 2x2 windows where convolution is usually done with 3x3 windows.

The rationale for using downsampling is to reduce the number of feature map coefficients to process. The other way to achieve downsampling is to use strides in the prior convolution layer.

model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), 
padding='same', activation='relu',
kernel_constraint=maxnorm(3)))
model.add(MaxPooling2D(pool_size=(2, 2)))

The most reasonable subsampling strategy is to produce dense maps of features, and then look at the maximal activation of the features over small patches.

Techniques to Achieve Higher Accuracy

In order to achieve high accuracy of your model, the following strategies should be used:

  1. Data Augmentation. It generates more training data from existing dataset by augmenting the samples via number of random transformations that yield believable looking images. The goal is that during the training your model never see the exact same picture twice.
  2. Feature extraction with pre-trained network. A highly effective and common way to deepen learning on small datasets is to use pre-trained networks. If the dataset is large enough and general enough, then the special hierarchy of features learned by the pre-trained network can effectively act as a generic model, and it can prove useful for many different computer vision problems. Feature extraction is a technique that uses the representations learned by a previous network to extract interesting features from new samples. These features are then passed through a new classifier, which is trained from scratch.
  3. Fine-tuning of pre-trained network. This is another very popular method of model reuse. The fine tuning technique consist of unfreezing a few convolutional base layers and jointly training both new added layers to the model, and these unfrozen top layers. This is call fine tuning as it slightly adjusts the more abstract representation of the model being reused.

CNN Model and Code Example

As mentioned previously, in this example I’m using CIFRA10 dataset. For this dataset and after some experimentation, I have created the model shown in the picture below:

With this model I have achieved an accuracy of 83.41%; leaving this model with a huge opportunity for improvement by applying the techniques described in this article.

Full code of the convolutional neural networks (CNN) model follows:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import to_categorical
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt

# network attributes
dropout1 = 0.2
dropout2 = 0.5
epochs = 100
learning_rate = 0.01
batch_size = 30
decay = learning_rate/epochs
sgd = SGD(lr=learning_rate, momentum=0.9,
decay=decay, nesterov=False)

# initialize random number generator
seed = 7
np.random.seed(seed)

###############################################
# Load and prepare dataset
###############################################

# load CIFRA10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# normalize the input from 0-255 to 0.00 - 1.00
X_train = (x_train.astype('float32')) / 255
X_test = (x_test.astype('float32')) / 255

# label encoding - one hot encode
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

number_classes = y_test.shape[1]

###############################################
# Create our CNN model
###############################################
def create_model():
model = Sequential()
# Convolutional Layer 1
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3),
padding='same', activation='relu',
kernel_constraint=maxnorm(3)))
# model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(dropout1))
# Convolutional Layer 2
model.add(Conv2D(32, (3, 3), padding='same',
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Convolutional Layer 3
model.add(Conv2D(64, (3, 3), padding='same',
activation='relu'))
model.add(Dropout(dropout1))
# Convolutional Layer 4
model.add(Conv2D(64, (3, 3), padding='same',
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Convolutional Layer 5
model.add(Conv2D(128, (3, 3), padding='same',
activation='relu'))
model.add(Dropout(dropout1))
# Convolutional Layer 6
model.add(Conv2D(128, (3, 3), padding='same',
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Dropout is added as Regulaizer
model.add(Flatten())
model.add(Dropout(dropout1))
# multilayer perceptrons part of the network
model.add(Dense(512, activation='relu'))
model.add(Dropout(dropout1))
model.add(Dense(128, activation='relu'))
model.add(Dropout(dropout1))
model.add(Dense(number_classes))
model.add(Activation('softmax'))
# compile the model
model.compile(loss='categorical_crossentropy',
optimizer=sgd, metrics=['accuracy'])
return model

# build a model
model = create_model()

# print model summary
model.summary()

# train the model
history = model.fit(X_train, y_train,
validation_data=(X_test, y_test),
epochs=epochs, batch_size=batch_size)

# Final evaluation of the model
score = model.evaluate(X_test, y_test, verbose=0)
print("\n--------------------------------------------")
print("\nCNN Accuracy: %.2f%%" % (score[1]*100))
print("\nCNN Error: %.2f%%" % (100-score[1]*100))

# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# save the model as JSON file
# serialize model to JSON
model_json = model.to_json()
with open("CNN2_model.json", "w") as json_file:
json_file.write(model_json)

# save the weights
# serialized weight to HDF5
model.save_weights("CNN2_model.h5")
print("\nCNN model saved to the disk.")

And here are accuracy and loss diagrams in dependency of epochs.

accuracy/epochs diagram
loss/epochs diagram

Summary

This is continuation of my AI exploration, and I hope you enjoyed this reading. In my next article I will be talking about data transformation and data preparation.

References

  1. Deep Learning with Python, By Francois Chollet, ISBN 9781617294433
  2. Artificial Intelligence for Humans Volume 1: Fundamental Algorithms, By Jeff Heaton, ISBN978–1493682225
  3. Artificial Intelligence for Humans Volume 3: Deep Learning and Neural Networks, By Jeff Heaton, ISBN978–1505714340
  4. Develop Deep Learning Models on Theano and TensorFlow Using Keras, By Jason Brownlee
  5. Deep Learning, By Ian Goodfellow, Yoshua Bengio and Aaron Courville, ISBN 9780262035613
  6. Neural Networks and Learning Machines, By Simon Haykin, ISBN 9780131471399
  7. Neural Networks and Deep Learning, By Chary C. Aggarwal, ISBN 9783319944623

_________________________________________________________________

NewCryptoBlock consists of a team of engineers with extensive technology and business backgrounds, united by a passion for innovation, professional development and building high-quality software products. Innovative technologies have the capacity of bringing to life revolutionary ideas that can change and better the world compared to the way we know it.

info@newcryptoblock.io