How you can build an image classifier in one day — Part 2: model training

Samuel MERCIER
Decathlon Digital
Published in
9 min readDec 4, 2018

--

This post is the second part of a series describing the development of an image classification API using transfer learning. The code for this series can be found on Decathlon Canada Github page.

Source: Decathlon media

What you will learn

In part 2 of this series on image classification, you will learn about

  • the distinction between training, validation and test sets;
  • how to train your classifier using the image-classification library;
  • how to interpret the accuracy of your classifier.

In part 1 of this series, we learnt that convolutional neural networks are the main mathematical frameworks used to identify the category of an image (called the image classification problem). We also learnt that using transfer learning, we can capitalize on existing neural networks to build a powerful image classifier with far fewer images. It is now time to start putting this knowledge into good use!

We will illustrate the application of transfer learning by developing an image classifier able to distinguish hockey gear — more specifically, 30 different pieces of equipment, including skates, sticks, helmets, pants, gloves, shoulder pads, elbow pads, hockey bags, visors, and so on.

Decathlon Canada Image-classification library

We will build the image classifier using the image-classification library developed at Decathlon Canada. To use the library, simply clone it to the desired location:

git clone https://github.com/decathloncanada/image-classification.git

In addition, make sure to have Tensorflow, dill, Pillow, scikit-optimize, pandas, matplotlib, selenium and Flask properly installed.

Training, validation and test sets

As we learnt in part 1 of this series, building a classifier by transfer learning involves using a model (let’s say Inception-V3) to decompose an image into its basic components, and adding on top of it a neural network identifying the category of the image given the components it contains.

To identify the appropriate variables in the neural network we build on top of Inception-V3, we need a training set of images — that is, a set in which we know the category of each image. When we train the model, we use these images to find the variables that minimize the number of errors made by the classifier.

But we also want to find good hyperparameters for our problem. We can choose the number of layers, the number of neurons per layer and the activation function of the neurons in the neural network that we put on top of Inception-V3 — these are called hyperparameters. Similarly, an hyperparameter can refer to how we learn from the training set of images (optimization function and learning rate). To find good hyperparameters, we keep images in a different set, called a validation set.

Finally, we also want to make sure that our classifier does not overfit. A classifier which overfits is good when classifying the images in the training and validation sets, but is actually bad when it is asked to classify an image it has never seen. To verify if there is overfitting, we also keep some images in a test set, which are images that the neural network never sees during training.

Hockey Community dataset

Source: Hockey Community

To build our hockey gear classifier, we will use a set of images obtained from Hockey Community, a platform designed to connect hockey leagues, teams, players and gear. However, note that you can follow along using any image dataset, as long as it is placed in the /data/image_dataset/ directory of the image-classifier as described below.

The Hockey Community dataset contains images of 30 different pieces of hockey equipment. It has nearly 2500 images, of which we keep 90% in the training set and 10% in the validation set.

To use the image-classification library, you need to place your images in the /data/image_dataset/ directory. In this directory, you should organize your images by set (train, val and, if you have one, test) and, in each set, by category:

data/
image_dataset/
train/
bag/
bag_1.jpg
bag_2.jpg
...
visor/
visor_1.jpg
visor_2.jpg
...
val/
bag/
bag_1.jpg
bag_2.jpg
...
visor/
visor_1.jpg
visor_2.jpg
...

The image-classification library will work for any number of categories or images per category, the only constraint being that the same categories need to be found in the training, validation and test sets.

Training your classifier

Now that you have your dataset properly organized in the /data/image_dataset/ directory, you can train your classifier by running the following command:

python main.py --task fit

You should see, in the terminal, how the accuracy evolves during training:

Found 2260 images belonging to 30 classes.
Found 238 images belonging to 30 classes.
Epoch 1/5
112/113 [============================>.] - ETA: 4s - loss: 2.7252 - categorical_accuracy: 0.2728
113/113 [==============================] - 560s 5s/step - loss: 2.7157 - categorical_accuracy: 0.2770 - val_loss: 1.9337 - val_categorical_accuracy: 0.5588
Epoch 2/5
113/113 [==============================] - 537s 5s/step - loss: 1.9116 - categorical_accuracy: 0.5239 - val_loss: 1.3601 - val_categorical_accuracy: 0.6765
Epoch 3/5
113/113 [==============================] - 585s 5s/step - loss: 1.5017 - categorical_accuracy: 0.6226 - val_loss: 1.0993 - val_categorical_accuracy: 0.7017
Epoch 4/5
113/113 [==============================] - 554s 5s/step - loss: 1.2865 - categorical_accuracy: 0.6655 - val_loss: 1.0052 - val_categorical_accuracy: 0.7311
Epoch 5/5
113/113 [==============================] - 522s 5s/step - loss: 1.1243 - categorical_accuracy: 0.7004 - val_loss: 0.8968 - val_categorical_accuracy: 0.7479

Interpreting the results

One epoch indicates that the neural network has gone through all the images in the training set, and updated its variables in a direction which decreases the number of errors it makes.

As expected, we can see that the accuracy increases during training, reaching 70% for the training set and 75% for the validation set — not too bad for our first attempt, given that we have 30 different pieces of equipment to distinguish, with less than 100 images per equipment.

To get a better look at the errors made by the classifier, we can use the confusion_matrix() and plot_errors() methods of the library. Here’s the confusion matrix obtained for the images in the validation set:

Confusion matrix:
[[ 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 1 4 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 5 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ]
[ 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 2 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 2 0 13 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 1 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ]
[ 0 2 0 0 1 0 0 0 0 0 0 0 0 0 11 0 0 0 0 1 0 3 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 2 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0 0 ]
[ 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 27 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 ]
[ 0 0 0 1 3 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 ]
[ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 ]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ]
[ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]
[ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 ]]
(0) bag
(1) blocker
(2) cage
(3) chest
(4) elbow_pads
(5) girdle
(6) gloves
(7) goalie_jock
(8) goalie_skates
(9) goalie_stick
(10) helmet
(11) holder
(12) jersey
(13) knee_pads
(14) leg_pads
(15) mask
(16) net
(17) pants
(18) player_jock
(19) player_skates
(20) player_stick
(21) shin_pads
(22) shoulder_pads
(23) skates_top_down
(24) socks
(25) steel
(26) tape
(27) thigh_boards
(28) trapper
(29) visor

In the confusion matrix, the row represents the true category of the image, and the column the predicted category. The confusion matrix helps us pinpoint where the classifier makes mistakes: for instance, we can see that all 8 images of bags in the validation set are properly classified, but that 3 out of the 8 images of shin pads were mistakenly classified as elbow pads.

Here is an example of a misclassified image:

In this case, the classifier misclassified the goalie blocker for a hockey bag, probably because most hockey bags in the training set are also of rectangular shape and dark color.

Now that we have a working prototype, our next job is to reduce the number of errors it makes—in part 3, we will go over the most common tricks (hyperparameter optimization, data augmentation, fine-tuning and class weighting) to improve accuracy.

We are hiring!

Are you interested by transfer learning and the application of AI to improve sport accessibility? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities. Otherwise, see you in part 3!

A special thanks to Gabriel Poulin-Lamarre, from D-Wave Quantum Computing, and Amrit Kahlon, from Hockey Community, for the review and comments.

--

--