Image Recognition Using ML (CNN) for Beginners

Published in

The Startup

9 min readOct 7, 2020

Image Recognition, Image Processing, Computer vision are some of the hottest topics in the tech industry these days. There are various inventions that have been developed using these technologies. Out of which, Face Recognition, Gesture Recognition, Driverless-cars, etc, are some of the coolest creations of computer vision and image recognition. And, the core or the foundation of all these creations is “Image Recognition”.

In this article, we will try to understand how Convolutional Neural Network (a type of Deep Learning algorithm) can be used for image classification. This article is basically designed for beginners or those who are interested in learning Image Recognition and Machine Learning. (Again one of the easiest that is out there)

Agenda:

Fashion MNIST (FMNIST) Clothing Classification
Creating a better Dataset
How to Define/Use a Model
Evaluate the Model
Make Predictions

Pre-requisite:

Python 3.6 (preferred 3.6.10)
Tensorflow 2.1.0 and Keras 2.3.1(as we are going to work with Deep Learning models and Keras)
Google Colab/PyCharm/Jupyter Notebook (I prefer Colab because there is free GPU support🤣)

1. FMNIST Clothing Classification

Why FMNIST dataset?

Firstly, Fashion MNIST is the most widely used image dataset and it can be a useful starting point for beginners to develop and learn image classification using convolutional neural networks. Second, this dataset already has a well-defined training and testing dataset that can be used without any hassle.

This dataset consists of 60,000 small 28x28 pixel grayscale images of 10 different types that include, shoes, t-shirts, dresses, bags, etc with labels assigned to them as follows:

0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

Let us load this FMNIST dataset and see how it exactly looks.

from keras.datasets import fashion_mnist
from matplotlib import pyplot
from keras.utils import to_categorical(trainX, trainy), (testX, testy) = fashion_mnist.load_data()#summarize loaded datasetprint('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))#plot first 9 images in the training datasetfor i in range(9):
pyplot.subplot(330 + 1 + i)     #define subplot
pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))
pyplot.show()     #plot raw pixel data

We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset.

2. Creating a better Dataset

This step is divided into two:

a. Loading of the dataset

Here, we know that our image is pre-segmented (i.e. every image in our dataset is assigned a digit that ranges from 0–9, which indicates that if it’s a shoe then it has number 0 and so on). All our images are of size 28x28 and they are all grayscaled images.

But to confirm or to be precise, we will reshape all the images in our dataset to 28x28 pixel with a monotonous color. So, that even if there are some images that don’t follow the pixel and color convention, it can be turned into one that follows our convention.

#load dataset(trainX, trainY), (testX, testY) = fashion_mnist.load_data()#reshape dataset to have a single channeltrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))

Now, because we know that our images are assigned a particular integer value, we will be using a technique called “one-hot encoding” to convert these integers into binary vectors.

#one hot encode target valuestrainY = to_categorical(trainY)
testY = to_categorical(testY)

We will now create a single function to perform all these three steps together.

def load_dataset():    #load dataset
    (trainX, trainY), (testX, testY) = fashion_mnist.load_data()    #reshape dataset to have a single channel    trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
    testX = testX.reshape((testX.shape[0], 28, 28, 1))    #one hot encode target values    trainY = to_categorical(trainY)
    testY = to_categorical(testY)    return trainX, trainY, testX, testY

b. Preparation of the dataset

Every image has a pixel value/or we can say that every image is represented using a pixel value that ranges from 0 and 255, where 0 means black and 255 means white

Now, we need to convert this 0–255 pixel to a range 0–1 for a better result. So, basically, we are re-scaling our images to a range[0,1]. We will do that by converting these pixel data to float values and then divide these values by 255 (which is our maximum pixel value)

def prepare_pixels(train, test):    #convert from integers to floats    train_norm = train.astype('float32')
    test_norm = test.astype('float32')    #normalize to range 0-1    train_norm = train_norm / 255.0
    test_norm = test_norm / 255.0    #return normalized images    return train_norm, test_norm

3. Creating/Using a Model

We will now create our baseline model.

We will create a basic model for our dataset which can work no matter how we change the current dataset (like adding new photos, changing the color of photos, etc). This model will be our base model, and then it can be improved based on the accuracy and other parameters.

ML Model — Convolutional Neural Network

It is a type of ML algorithm that has been developed to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

CNN is one of the main categories to do image recognition, image classification, object detection, facial recognition, etc.

Why is CNN preferred for image datasets?

In CNN, every image is read in parts than as a whole image. For instance, let say we have a 300x300 pixel image, then CNN will divide the image into smaller images of 4x4 matrices and then deal with these small matrices one-by-one. And then, features are extracted from those smaller image matrix.

Any CNN model will have two main aspects:

Feature extraction — Performed using convolutional and pooling layers
Classifier — that will make a prediction.

How will our CNN model be?

We will start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max-pooling layer.
We know that here we have to categorize the data into 10 different classes, right? So this will be called a multi-class classification problem. Let me ask you a question, based on the images that we have seen. What do you think would be the number of output layers? 10!! (come on that was obvious).
We will require an activation function (AF). An AF is responsible for transforming the summed weighted input into an output.
We will also add Dense layers between the feature extractor and the output layer to interpret the features. Let us add 100 nodes and see how it goes.
We will use a RELU Activation function and he weight initialization scheme (best practice). RELU is a kind of AF that looks at the output of the activation function, if the output is positive then mark the end result as 1 otherwise mark the end result as zero.
We will then use a stochastic gradient descent (SGD) optimizer to optimize our learning algorithm. Our SGD will have a learning rate of 0.01 and a momentum of 0.9. (Try changing learning rates to see the differences in accuracy values)
Finally, we will compile the model with a categorical cross-entropy loss function along with our SGD (considered suitable for multi-class classification), and we will monitor our classification accuracy.

#define our CNN modeldef cnn_model():    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
    model.add(Dense(10, activation='softmax'))    #compile model    opt = SGD(lr=0.01, momentum=0.9)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])    return model

4. Evaluate the Model

Once we are ready with our model, the next step is to evaluate our model for accuracy.

Evaluation Metrix — K-fold cross-validation

We will evaluate our model using a K-fold cross-validation metrix. Here, try to choose your k value in such a way that it’s not too large. Eventually, it will help us avoid long running time and evaluate our model repeatedly.

The training dataset is shuffled before the split. Sample shuffling is performed each time so that any model we evaluate will have the same train and test datasets in each fold.

We will train the model with 10 epochs and a default batch size of 32 examples. For every epoch, our test set for k folds will be used to evaluate the model. This will help us to create a learning curve to identify the performance of the model.

def evaluate_model(dataX, dataY, n_folds=5):    scores, histories = list(), list()    #prepare cross validation    kfold = KFold(n_folds, shuffle=True, random_state=1)    #enumerate splitsfor train_ix, test_ix in kfold.split(dataX):
        model = cnn_model()
        trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]
        history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)
        _, acc = model.evaluate(testX, testY, verbose=0)
        print('> %.3f' % (acc * 100.0))
        scores.append(acc)
        histories.append(history)
        model.save('final_model.h5')  #save the model for future use    return scores, histories

Result of Evaluation

We will be presenting two aspects of the results. First, the accuracy diagnosis and second, the loss between training and testing dataset.

def accuracy_summary(histories):    for i in range(len(histories)):
        pyplot.subplot(222)
        pyplot.title('Classification Accuracy')
        pyplot.plot(histories[i].history['accuracy'], color='blue', label='train')
        pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test')    pyplot.show()
def loss_summary(histories):    for i in range(len(histories)):
        pyplot.subplot(211)
        pyplot.title('Loss')
        pyplot.plot(histories[i].history['loss'], color='blue', label='train')
        pyplot.plot(histories[i].history['val_loss'], color='orange', label='test')    pyplot.show()

Now, a final function to call all the above-defined functions.

def final():    trainX, trainY, testX, testY = load_dataset()
    trainX, testX = prepare_pixels(trainX, testX)    scores, histories = evaluate_model(trainX, trainY)
    accuracy_summary(histories)
    loss_summary(histories)final()

The above image shows the result for accuracy values for each fold of the cross-validation process. The results may vary with the stochastic nature of the algorithm on running it multiple times.

Blue lines in the graph indicate model performance on train dataset and orange lines indicate performance on test dataset. Additionally, we can see that the model is able to achieve a good fit with train and test learning curves converging.

5. Make Predictions

An important thing to keep in mind is that when making predictions, we need to have a grayscale image for prediction. As we have trained our model on grayscale images. Another workaround for this could be an addition of a new function that converts an RGB image into a grayscale image. For now, I will use one of the images from the test dataset and predict the class of that image.

# make a prediction for a new image.%pylab inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.models import load_modeldef load_image(filename):    #load the image
    img = load_img(filename, grayscale=True, target_size=(28, 28))    #convert to array
    img = img_to_array(img)    #reshape into a single sample with 1 channel
    img = img.reshape(1, 28, 28, 1)    #prepare pixel data
    -img = img.astype('float32')
    img = img / 255.0    return img

Use the saved model to predict the class to which it falls. I have created if-else conditions to make it more clear for you guys to understand the exact category.

img1 = mpimg.imread('/content/sample_data/sample_image.png')
imgplot = plt.imshow(img1)
plt.show()img = load_image("/content/sample_data/sample_image.png")
model = load_model('/content/final_model.h5')# predict the class
result = model.predict_classes(img)if result[0] == 0:
    print("Top")
elif result[0] == 1:
    print("Trouser")
elif result[0] == 2:
    print("Pullover")
elif result[0] == 3:
    print("Dress")
elif result[0] == 4:
    print("Coat")
elif result[0] == 5:
    print("Sandal")
elif result[0] == 6:
    print("Shirt")
elif result[0] == 7:
    print("Sneaker")
elif result[0] == 8:
    print("Bag")
elif result[0] == 9:
    print("Ankle Boot")
else:
    print("Not in the list")

From the above image, you can see that the image that has been passed on to our model was that of a pullover and it did predict the image as a “pullover”. You can also try using a different image and check for your own.

Some additional FYI thing

How this model can be further improvised

By padding convolution — helps more features to contribute to the output
By increasing filters — helps in extracting simple features from the input images

If you are facing any issue pertaining to Deep Learning models / ML models. You can contact me via LinkedIn or Facebook. Or else comment here itself, feedbacks are always a good way to improve.

I will be posting something interesting again with easy steps soon. Till then Enjoy coding !! 👍