DEEP LEARNING: IMAGE CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS (CNNs)

Ariel Jumba
7 min readOct 14, 2022

Background

Traditional machine learning methods have continued to be developed into more modern techniques, such as deep learning to cater to contemporary business and personal needs and technological advancements. Deep learning techniques are greatly influenced by advancements in software and hardware technology and immense data growth.

The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027 with a CAGR of 13.2%, according to Statista reports. Almost 60% of the world’s big data has been collected in the past ten years.

What is Deep Learning?

Deep learning is simply the ability to extract patterns from data using neural networks. It is an advanced level of machine learning defined as the ability of computers to learn explicitly without being programmed.

What are Neural Networks?

Neural networks are perhaps one of the most complex machine learning algorithms. They are modeled similarly to a human brain. The human brain consists of elements that process various external signals into final outputs. The signals feed into a sensor processor, which determines the threshold number and weight to attach to each signal and combines the most important inputs into one final acceptable output.

Neural networks consist of neurons as the main building blocks. They may be single-layer or multiple-layer neural networks. A single layer consists of inputs assigned weights and fed into an activation function to produce outputs. The activation functions may be operations requiring linear functions such as linear regression functions. In reality, however, data is unstructured, complex, and non-linear.
Neural networks help understand various social, political, or economic phenomena, natural language processing as well as automation of devices, among other functions. They work similarly to biological ones.
Neural networks come in many forms, including:

  • Perceptrons
  • Feed Forward Neural Networks
  • Convolutional Neural Networks
  • Radial Basis Function Neural Networks
  • Recurrent Neural Networks
  • LSTM –Long Short-Term Memory
  • Sequence to Sequence models
  • Modular Neural Networks

Convolutional Neural Networks

Convolutional neural networks (CNNs) are usually helpful in image classification, object recognition, and computer vision tasks. They use linear algebra concepts such as matrix multiplication to identify image patterns and therefore require very high computational power to perform.
CNN’s may utilize either classification or regression models:

  • Regression: where the output variable takes continuous value.
  • Classification: where the output value takes the class label.

How do Convolutional Neural Networks Work?

Feature Detection

A convolutional neural network extracts local spatial features from an image and combines the local elements with higher-order features. The higher-order features then help to separate different images linearly. High-level feature detection is critical in CNNs. If enough of these features are detected in a class, you can confidently classify the image as belonging to that class.
Image variations tend to be a problem when manually extracting features from images. These variations could be:

  • Viewpoint variations
  • Scale variations
  • Intra-class variations
  • Background clutter
  • Occlusions

Instead of connecting every pixel in the image inputs to every neuron, CNNs connect only to a patch of inputs to neurons in a hidden layer in a process known as spatial structure, which utilizes sliding windows to define connections.

Extraction using the spatial structure method

Below are the steps followed in feature extraction:

  • The filter patch (weighted filter/kernel) is applied to our image input.
  • Element-wise multiplication.
  • Addition of outputs to obtain the final feature map to be utilized in the subsequent stage.
  • Non-linear function eg ReLU activated.

Key to note:

The output matrix is also known as the feature map.

The number of features to be detected is defined at each layer, and therefore changing the filters’ weights can significantly impact the features seen.

Computation of various classes for classification is done on a final dense layer.

The output is usually a volume of images representing all the images detected.

Parts of a Convolutional Neural Network

They consist of the following:

  • Convolution: This is what we have described above and involves applying filters(kernel) to generate feature maps.
  • Non-Linearity: After every convolution operation(layer), we apply a non-linearity activation function. For images, we primarily use ReLU, a pixel-by-pixel process that replaces all negative values with zero.
  • Pooling: This is the downsampling operation applied on each feature map. Pooling helps in reducing dimensionality. An example is Maxpooling, where we filter the feature maps by the maximum values, e.g., from the below matrix, we can generate max pool of 2X2 features with a stride = 2 as shown below:

Sample Exercise: Identifying Images with Their Clothing Type

1. Importing and Installing Dependencies

We shall be using Tensorflow Keras for this exercise.

from __future__ import absolute_import, division, print_function
import tensorflow as tf
import tensorflow_datasets as tfd
import math
import numpy as np
import matplotlib.pyplot as plt
import tqdm
import tqdm.auto
tqdm.tqdm = tqdm.auto.tqdm

The dataset contains images of 28 * 28 = 784 pixels which will be our input. We use the 784 pixels because the images have been converted into one single array column.

2. Assigning class names

Each image is mapped to a single label. Since the class names are not included with the dataset, we shall store them as shown below to use them later when plotting the images:

class_names = ['T-shirt/top','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle Boot']

3. Splitting Training and Test Data

num_train_examples = metadata.splits['train'].num_examplesnum_
test_examples = metadata.splits['test'].num_examples
print('The number of training examples is : {}'.format(num_train_examples))
print('The number of test examples is : {}'.format(num_test_examples))
  • The number of training examples is : 60,000
  • The number of test examples is : 10,000

4. Pre-processing Data

The value of each pixel in the data is an integer in the range [0,255].
Therefore, these values must be normalized to the range[0,1] for the model to work correctly.
We shall then create a normalization function then apply it to each image in the dataset as shown below:

def normalize(images, labels):
images = tf.cast(images,tf.float32)
images /= 255
return images, labelstrain_dataset = train_dataset.map(normalize)
test_dataset = test_dataset.map(normalize)

5. Plotting Pre-processed data

Let’s take a look at a single image and remove the color dimension by reshaping it and then plot it as shown below:

for image, label in test_dataset.take(1):
break
image = image.numpy().reshape((28,28))plt.figure()
plt.imshow(image)
plt.colorbar()
plt.grid(False)
plt.show()

Let’s now plot the images first 5*5 images:

plt.figure(figsize=(10,10))
i = 0
for (image,label) in test_dataset.take(25):
image = image.numpy().reshape((28,28))
plt.subplot(5,5,i + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.xlabel(class_names[label])
i += 1
plt.show()

6. Building the Model

This requires first configuring the layers as shown below and then building the model.

model = tf.keras.Sequential(
[
tf.keras.layers.Flatten(input_shape = (28,28,1)),
tf.keras.layers.Dense(units=128,activation=tf.nn.relu),
tf.keras.layers.Dense(10,activation=tf.nn.softmax)
]
)
  • The first part is known as image flattening/unstacking the layers and transforms the images from a 2-dimensional array of 28*28 pixels to a 1-dimensional representation of 784 pixels.
  • The second part is the hidden layer.
  • The third part is the output layer, a 10-node SoftMax layer. Each node represents a class of clothing.
  • The output value will be a range of [0,1], representing the probability that the image belongs to that class.

7. Model Compilation

model.compile(optimizer='adam', 
loss = 'sparse_categorical_crossentropy',
metrics = 'accuracy')
  • Optimizer: Adjusts the inner parameters of the model to minimize loss.
  • Loss: Matches the model’s actual output with the desired output.
  • Metrics: Monitors the training and testing steps.

8. Model Training

We first define the iteration parameters:

  • Repeat forever: This is done by specifying the command
  • dataset.repeat(). The epochs parameter limits it.
  • Dataset.shuffle(6000): This rerandomizes the model so that the algorithm doesn’t learn anything from the order of the model.
  • Dataset.batch(32): instructs the command model.fit to use clusters of 32 images and labels when updating model variables.
  • Epochs= 5: Limits training to 5 iterations of the training dataset, i.e., 5 * 6000–30000 examples for this case.
  • Math.ceil: Rounds to the nearest integer.
BATCH_SIZE = 32
train_dataset = train_dataset.repeat().shuffle(6000).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)
model.fit(train_dataset,epochs=5,steps_per_epoch=math.ceil(num_train_examples/BATCH_SIZE))

9. Evaluating Model Accuracy

We will test accuracy using the test dataset. The accuracy of the test dataset is smaller. This is normal since the model was only trained on the training dataset and not exposed to some images in the test dataset.

test_loss,test_accuracy = model.evaluate(test_dataset,steps=math.ceil(num_test_examples/32))
print(‘Accuracy on test dataset is:{}’.format(test_accuracy))

10. Model Predictions

for test_images, test_labels in test_dataset.take(1):
test_images = test_images.numpy()
test_labels = test_labels.numpy()
predictions = model.predict(test_images)
predictions.shape
print(predictions[0])
print(np.argmax(predictions[0]))
1/1 [==============================] — 0s 79ms/step(32, 10)

A prediction is an array of 10 numbers. that assigns respective confidence levels to which the image corresponds to each of the 10 different articles of clothing.

[1.43503175e-05 1.55980752e-06 9.30114836e-03 5.68765972e-06
9.70131755e-01 2.37947373e-09 2.05402542e-02 2.44101379e-08
5.25351743e-06 3.40834760e-09]

From the results below, the model is confident that this image is a shirt of class label 4.

print(test_labels[0])
print(class_names[6])
4
Shirt

Conclusion

Through this article, we have gained knowledge on:

  • Defining deep learning.
  • Defining neural networks.
  • The basic features, functions, and composition of convolutional neural networks.
  • Simple Image classification using convolutional neural networks in Tensorflow.

Key to note is that computers are not as intuitive as humans and need more data. Deep learning networks work best with vast amounts of data — the more data, the better the model performance.

References

  1. Statista,Big data market size revenue forecast worldwide from 2011 to 2027,Retrieved Oct 2022, from https://www.statista.com/statistics/254266/global-big-data-market-forecast/

--

--