How you can build an image classifier in one day — Part 3: optimizing the accuracy

Samuel MERCIER
Decathlon Digital
Published in
6 min readDec 11, 2018

This post is the third part of a series describing the development of an image classification API using transfer learning. The code for this series can be found on Decathlon Canada Github page.

Source: Asmaa Hussien

In part 3 of this series on image classification, you will learn about:

  • improving classification by optimizing the hyperparameters;
  • what is data augmentation, fine-tuning and class weighting.

In part 2 of this series, we trained a classifier to identify 30 different pieces of hockey equipment. The classifier achieved classification accuracies of 70% for the training set and 75% for the validation set. In this article, we will learn a few tricks to improve accuracy, namely hyperparameter optimization, data augmentation, fine-tuning and class weighting.

Hyperparameters

As we briefly mentioned in part 2 of this series, hyperparameters describe the structure of the neural network we put on top of Inception-V3, as well as how we learn from the images in the training set. The hyperparameters need to be choosen before we train the classifier — and while default values of the image-classification library can give appropriate results, it is always better to search for hyperparameters working particularly well for our specific problem.

The image-classification library supports optimization of the following hyperparameters:

  • epochs: number of times we go through the images in the training set to improve our neural network;
  • learning_rate: parameter describing how much we adjust the neural network to properly classify the images in the training set. A high learning rate can provide a greater accuracy in fewer epochs, but comes at the risk of higher overfitting;
  • nb_layers: number of layers in the neural network that we build on top of Inception-V3 model;
  • hidden_size: number of neurons per layer in the neural network that we build on top of Inception-V3 model;
  • activation: mathematical equation used to translate the input of a neuron into its output;
  • dropout: number of neurons excluded in each epoch. For some applications, the exclusion of neurons reduces overfitting.

Obviously, it is inefficient to search for good hyperparameters by trial and error, given the massive amount of possible combinations. Luckily, as we will see later, we can use libraries such as scikit-optimize to find good hyperparameters in a limited number of iterations.

Data augmentation

Data augmentation is another trick to improve the accuracy of a classifier. It refers to training the neural network not only on the original images, but also on modified images which could have realistically be taken by a user. Data augmentation is automatically taken into account by the image-classification library we are using in this series.

Details about the implementation of data augmentation can be found in the image_classifier.py file located in the src directory. More specifically, it is found in this little piece of code:

datagen_train = ImageDataGenerator(rotation_range=180,
rescale=1./255,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=[0.9, 1.5],
horizontal_flip=True,
vertical_flip=True,
fill_mode=’nearest’
)

In this piece of code, we use the ImageDataGenerator() method to generate new images using rotation, translation, scaling, and flipping transformations. You can refer to the following tutorial to find great examples of the new images generated by data augmentation.

Fine-tuning

Fine-tuning relates to the idea of slightly modifying Inception-V3 model in addition to the neural network that we have added on top of it. Remember that the idea behind transfer learning is to use another model (such as Inception-V3) to identify the components in an image, and build a new neural network on top of it to find the category of the image given the components it contains.

When we do fine-tuning, we train the classifier for a few additional epochs, during which we not only adjust the variables in the neural network on top, but the variables in Inception-V3 model as well.

Fine-tuning can provide additional improvement of the classifier, but we have to be careful: with a small dataset, too much fine-tuning can quickly lead to overfitting!

Class weighting

Sometimes, we have more images of a given category than others. For instance, in the Hockey Community dataset, we have more images of hockey bags (75) than goalie jocks (45). Given that the neural network will see more bags during training, it will think that it is more important to classify accurately this category, which is not necessarely desirable.

The impact of dataset imbalance can be decreased by giving more weights to images underrepresented in the dataset. With class weighting, the neural network will still see more bags during training, but will adjust its variables to a greater extent when it misclassifies the image of a goalie jock in the training set.

Transfer learning model

The image-classification library uses by default Google Inception-V3 model to identify the components inside an image. However, other models are also supported and can provide a higher accuracy for some applications. The library currently supports Inception-V3, Xception, Resnet50, and Inception_Resnet-V2 models.

Optimizing our hockey gear classifier

Source: Decathlon media

To optimize the hockey gear classifier we started building in part 2, we can run the following command:

python3 main.py --task hyperparameters --number_iterations 30 --transfer_model Inception_Resnet

This command runs the optimization method of the library, which goes through the hyperparameters and find a proper combination for our application. The library uses scikit-optimize underneath, and treats fine-tuning and class weighting as True/False hyperparameters.

The number_iterations argument describes the number of hyperparameter combinations we let the optimizer try, while the transfer_model argument describes the model we want to use to decompose the image into its components (Inception-V3, Xception, Resnet50 or Inception_Resnet-V2).

After a few iterations, the algorithm found the following combination of hyperparameters:

epochs: 10 
hidden_size: 2048
learning rate: 0.00021150812662254273
dropout: 0.3479038127640791
fine_tuning: True
nb_layers: 1
activation: tanh
include_class_weight False
Found 2261 images belonging to 30 classes.
Found 237 images belonging to 30 classes.
Epoch 1/10 114/113 [==============================] - 102s 894ms/step - loss: 2.2962 - categorical_accuracy: 0.4000 - val_loss: 1.2754 - val_categorical_accuracy: 0.6203
Epoch 2/10 114/113 [==============================] - 88s 773ms/step - loss: 1.4480 - categorical_accuracy: 0.5939 - val_loss: 1.0444 - val_categorical_accuracy: 0.6878
Epoch 3/10 114/113 [==============================] - 92s 809ms/step - loss: 1.1725 - categorical_accuracy: 0.6698 - val_loss: 1.0598 - val_categorical_accuracy: 0.6456
Epoch 4/10 114/113 [==============================] - 91s 802ms/step - loss: 1.1127 - categorical_accuracy: 0.6843 - val_loss: 1.0115 - val_categorical_accuracy: 0.6540
Epoch 5/10 114/113 [==============================] - 90s 794ms/step - loss: 0.9984 - categorical_accuracy: 0.7058 - val_loss: 0.8826 - val_categorical_accuracy: 0.7131
Epoch 6/10 114/113 [==============================] - 91s 802ms/step - loss: 0.9227 - categorical_accuracy: 0.7386 - val_loss: 0.9555 - val_categorical_accuracy: 0.6962
Epoch 7/10 114/113 [==============================] - 91s 794ms/step - loss: 0.9033 - categorical_accuracy: 0.7303 - val_loss: 0.7328 - val_categorical_accuracy: 0.7637
Epoch 8/10 114/113 [==============================] - 90s 792ms/step - loss: 0.8061 - categorical_accuracy: 0.7470 - val_loss: 0.7465 - val_categorical_accuracy: 0.7932
Epoch 9/10 114/113 [==============================] - 90s 792ms/step - loss: 0.7817 - categorical_accuracy: 0.7601 - val_loss: 0.7078 - val_categorical_accuracy: 0.7975
Epoch 10/10 114/113 [==============================] - 92s 803ms/step - loss: 0.8035 - categorical_accuracy: 0.7658 - val_loss: 0.7956 - val_categorical_accuracy: 0.7384
============
Begin fine-tuning
============
Epoch 1/10 114/113 [==============================] - 181s 2s/step - loss: 0.5319 - categorical_accuracy: 0.8378 - val_loss: 0.4653 - val_categorical_accuracy: 0.8861
Epoch 2/10 114/113 [==============================] - 148s 1s/step - loss: 0.3622 - categorical_accuracy: 0.8803 - val_loss: 0.3911 - val_categorical_accuracy: 0.8945
Epoch 3/10 114/113 [==============================] - 148s 1s/step - loss: 0.2957 - categorical_accuracy: 0.9106 - val_loss: 0.3347 - val_categorical_accuracy: 0.9114
Epoch 4/10 114/113 [==============================] - 148s 1s/step - loss: 0.2559 - categorical_accuracy: 0.9273 - val_loss: 0.4438 - val_categorical_accuracy: 0.8397
Epoch 5/10 114/113 [==============================] - 148s 1s/step - loss: 0.1943 - categorical_accuracy: 0.9422 - val_loss: 0.3312 - val_categorical_accuracy: 0.8987
Epoch 6/10 114/113 [==============================] - 148s 1s/step - loss: 0.1660 - categorical_accuracy: 0.9448 - val_loss: 0.3486 - val_categorical_accuracy: 0.8776
Epoch 7/10 114/113 [==============================] - 148s 1s/step - loss: 0.1490 - categorical_accuracy: 0.9645 - val_loss: 0.3410 - val_categorical_accuracy: 0.8734
Epoch 8/10 114/113 [==============================] - 148s 1s/step - loss: 0.1321 - categorical_accuracy: 0.9694 - val_loss: 0.3100 - val_categorical_accuracy: 0.9156
Epoch 9/10 114/113 [==============================] - 148s 1s/step - loss: 0.1072 - categorical_accuracy: 0.9716 - val_loss: 0.2978 - val_categorical_accuracy: 0.9114
Epoch 10/10 114/113 [==============================] - 148s 1s/step - loss: 0.1161 - categorical_accuracy: 0.9702 - val_loss: 0.3122 - val_categorical_accuracy: 0.9241

With these hyperparameters, we reach an accuracy of 97% for training and 92% for validation! Given the 70–75% accuracy we got using the default values, this shows how important optimizing the hyperparameters is to achieve good accuracy. In this case, fine-tuning was particularly important to the accuracy of our hockey gear classifier.

Now that we have a good classifier (even though there is always room for further improvement!), our next job is to deploy it. In part 4 of this series, we will develop a quick API using Flask to load the image of a piece of hockey equipment and return the category it belongs to.

We are hiring!

Are you interested by transfer learning and the application of AI to improve sport accessibility? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities. Otherwise, see you in part 4!

A special thanks to Gabriel Poulin-Lamarre, from D-Wave Quantum Computing, and René Lancine Doumbouya and Guillaume Simo, from Décathlon Canada, for the comments and review.

--

--