Image Recognition & Transfer Learning — Great performance with limited data — Keras/TensorFlow

8 min readOct 13, 2019

Today we will talk about image recognition using Machine Learning in Python and how to utilize Transfer Learning to achieve great performance when your data is limited. See my GitHub for all the code in this article and dataset.

Task: Given an image containing a dog, predict the breed of the dog out of 120 possible dog breeds. The data set is shared by Stanford University — Link to the dataset is here

The dataset contains about 150 images for each of the 120 breeds, totaling around 20,000 images. The dataset comes with multiple downloads, including class labels and bounding boxes (detected position of the dog in the image). But for simplicity we will concentrate on how we can train our model with this data, then given an image of a dog (not trained on our network with) , predict its breed correctly.

If you looked at some other classification problems online, I am sure you read some posts working on Dog vs Cat. classification problem. If I am not mistaken on numbers, given 10,000 images of each, the task is to correctly classify an image as a dog or cat.

However in this task there are 120 classes the image can belong to and if you consider the fact that dogs are not that different looking than each other, correctly classifying 120 breeds is a tougher job than classifying if something is a cat or a dog. And having only 150 samples of each class is not enough (compare to cat vs dog example, 10K image for each class) so what can we do to train a model that will have a great performance?

Transfer Learning is something that can be useful in this scenario. The idea is to find an already working model on a similar problem, and we are transferring what that model learnt (the network structure and weights to be more technical) to a new model, adjusting the new model a little bit to fit our particular data input and output. For that we follow the below steps:

a) Find a pre-trained model that worked on a similar problem to ours that has a lot more data and an established performance

b) Start with that neural network, remove the last few layers and re-organize according to your classification issue

c) Train the new network on your limited dataset

d) Measure performance, go back to step b if necessary

For Image recognition, so far as of this writing, the best models can be found for ImageNet. There is a lot more on it online you can read so I won’t get into its details here but in short; There was a competition to correctly classify millions of images into thousand classes(originally 1,000, later years expanded as much as to 22,000 classes) of everyday objects we can find in our visuals such as animals, vehicles, buildings, landscape, views etc… Those models were trained on millions of training images for weeks on GPUs on clouds. So obviously being able to utilize some of this knowledge is incredibly helpful.

Luckily Python Keras library makes it easy to integrate these models into our model. In this example, we will use InceptionV3 which was developed by Google and scored 1st place in the ImageNet competition in 2015.

First we import keras, and initialize some variables

from keras.applications import InceptionV3
from keras.applications.inception_v3 import preprocess_inputnum_of_classes = 120

Next part, we are importing Inception model

target_size = (299, 299)
batch_size = 32
base_model = InceptionV3(weights="imagenet", include_top=False)

Notice include_top=False. We are doing this because our final layer will have to have 120 neurons(our number of classes we want to predict) as opposed to thousand in the actual Inception model. This is part of the step b we described above.

Then we add our final layers. We are adding an average pooling, which just reduces the dimensionality of its input. If its input is 48x48 and we do (2,2) pooling, we will end up with 24x24 output where each number will represent the average of 2 from the input therefore reducing the dimension by half.

Next we add a fully connected dense layer that has as many neurons as we want to predict. And finally we are initializing our new model, with the same input as Inception3 and our final output layer.

x = base_model.output
x = GlobalAveragePooling2D()(x)
output = Dense(num_of_classes, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=output)

I want to note that we are using softmax activation function here, the reason being is that we are doing multi-class prediction. If we were to work on Dog vs Cat problem, we would only have two choices in that case, sigmoid function might suite better. So while developing your model, it is essential to understand the structure of your data so you can come up with something that correctly represents it.

Next part is training our network, however an import consideration here is that, the layers that we imported from Inception should not be trained during our training cycle. If we do, with the limited amount of data, we will totally mess the existing models performance and overfit our data at best. The initial layers of the networks are core capable of distinguishing low level features in images such as edges, corners, colors, later layers distinguish objects containing those low level features, such as cars, animals etc… Therefore, while keeping imported layers frozen, we will train the network on our final 2 layers.

train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input, rotation_range=40,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True,
                                   fill_mode='nearest')

Let’s go over this: We are creating the training data generator here. We are passing some additional arguments such as rotation, flip etc… If you do not have enough data, your model might be more likely to overfit using same training set over and over. The advantage of doing this is, every epoch entire imageset is fed into the network, if these options are enabled, the images will be distorted a little bit within the arguments you gave, such as horizontal flip, or rotate 10 degree etc so that the network may never see the same exact image twice.

train_generator = train_datagen.flow_from_directory('./train/',
       target_size=target_size,
       color_mode='rgb',
       batch_size=batch_size,
       class_mode='categorical',
       shuffle=True,
       seed=42)

And in this part, we are instructing keras to read our training images from the described folder. Batch size is how many images will be processed concurrently which may depend on the amount of memory you are running. If you have a higher memory, you can use a larger batch size to process each step faster. I also noticed, if the amount of memory utilized by your model is more than 10% of the total system memory, it shows you a warning on the console which doesn’t harm your calculations, just fyi…

Class Mode is also categorical since we are trying to determine among 120 classes.

Next we create a validation data set generator. Validation dataset is something we use to test our models performance after its trained. During the predictions, the systems is not allowed to learn or update its parameters. We only get its predictions so we can calculate some metrics.

val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)val_generator = val_datagen.flow_from_directory('./validation/',
                                    target_size=target_size,
                                    color_mode="rgb",
                                    batch_size=batch_size,
                                    class_mode="categorical")

Notice we are not distorting validation images as there is no need to since the system is not learning at that point.

Next we compile our model and start its training

model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])

step_size_train = train_generator.n // train_generator.batch_size
step_size_valid = val_generator.n // val_generator.batch_size

model.fit_generator(generator=train_generator,
                    steps_per_epoch=step_size_train,
                    validation_data=val_generator,
                    validation_steps=step_size_valid,
                    callbacks=[tensorboard],
                    epochs=50)

We are using Adam optimizer, with categorical_crossentropy loss function since we are doing a categorical determination. Step size is basically how many steps needed each epoch to go over entire dataset. We know the batch size and total images, so there it goes… I am also using tensorboard for callbacks which gives us a nice user interface to review our models metrics.

We define 50 epochs for this experiment. Generally 30–100 would suffice and your system will converge to a point that no more learning, or at least not significant will be happening after that point. As a matter of fact, with whole a lot more training, your systems performance might deteriorate because the network will be overfitting more and more, learning more irrelevant feature to fit the training data more perfectly which will suffer in performance when predicting never before seen images in validation set. So you need to find the right spot. At some point, your loss on your validation set starts going higher that’s the number you are looking for.

Accuracy and Performance

When we look at metrics as recorded by TensorFlow, we reach about 88% accuracy on both the training dataset, and the validation data set which is not bad.

If you look at below picture, you can see that validation loss started increasing after around 30 epochs, that’s where the models starts overfitting and learning irrelevant features from the training dataset.

Prediction

We load an image as passed in the params to predict its class.

orig_img = image.load_img(args["image"], target_size=target_size)

img = np.expand_dims(orig_img, axis=0)
img = preprocess_input(img)

preds = model.predict(img)

Images are loaded as 3 dimensional arrays, x,y and 3rd dimension would represent the color. We are expanding the dimensions of the img object. That’s just some preprocessing for keras since it expects 4 dimensions from data, the 1st one being the batch number since we train our model in batches. So we simply add another dimension, which will be handled by the preprocess function from the imported keras model.

Once the data is transformed into what the model expects, we get the predictions and output the highest 3 probable classes, I later add a way to show the image on screen with the title added as classes.

results = []
maxRows = 3
classes = train_generator.class_indices

for pred in preds:

    top_indices = pred.argsort()[-maxRows:][::-1]
    for i in top_indices:
        clsName = list(classes.keys())[list(classes.values()).index(i)]
        result = "{}: {:.2f}%".format(clsName, pred[i] * 100)
        results.append(result)

for res in results:
    print(res)

plt.imshow(orig_img)
plt.suptitle("\n".join(results), fontsize=9)
plt.show()

I used my wife’s favorite dog Dina to test the model. The reason I chose those 2 pictures of her was because she was wearing a dress and I was wondering if that would be a problem for classification.

As you can see in both cases, the model predicted her breed correctly as York. The probabilities are a little different for each image (65% & 86%), but considering the dress playing an obstruction role, not bad at all. I mean if we compare to human performance, i would say it looks more like a christmas gift than a puppy, but I hope wifey won’t read this article 😆

The entire code for this article can be found on my GitHub. Let me know your thoughts. In next article, we will create a neural network from scratch and see what kind of performance we can achieve with that.

Image Recognition & Transfer Learning — Great performance with limited data — Keras/TensorFlow

Written by judopro