Speed up your Deep Learning projects with pre-trained Neural Networks

In the past few years, many Machine Learning applications related to image recognition stood out. Used to power self-driving cars, detecting anomalies in medical imaging, or even tell dog breeds apart, image classifiers are usually built over an architeture called Convolutional Neural Networks (or CNNs).

Self-driving cars, medical imaging and dog breed classification: they all use CNNs

CNNs exist since 1980. However, they became popular only in 2012, with an architecture called AlexNet. Since then, faster and more powerful architectures are discovered every year.

Diagram of the AlexNet CNN

This "new wave" of image recognition applications was only possible due to the computational power available today. It would be impossible to use a CNN in 1980. For example, the Cray X-MP/1 supercomputer (launched in 1983, costing US$ 10 million) was able to perform 200 million FLOPS (floating-point operations per second), while a Nvidia RTX 2080Ti GPU (launched in september 2018) is able to perform 12.45 trillion FLOPS. It is a performance speedup of 62250 times, in a hardware that costs 10 thousand times less.

The left image show the Cray X-MP/1 supercomputer. The right image show a Nvidia RTX 2080Ti.

Even with the computational power available today, training a CNN can prove to be a very slow task. A 2017 CNN architecture called Xception needed 60 Nvidia K80 GPUs to train onto the ImageNet dataset on a speed of 28 steps per second. Having to train things like this frequently can turn into a hard problem in production enviroments that need quick deploys.

But if someone already spent time (and electricity) training those sophisticated neural networks, why do I have to train it all again, from the beggining? Isn't it possible to reuse something?

Yes, it is possible to reuse an already trained network. We can reuse the weights of a network already trained on the ImageNet dataset, and use it to create a healthy food image classifier, for example. It is important to note that ImageNet doesn't have a "healthy food" category, but still we can train our new model more quickly in a pre-trained network.

This happens because the first few layers of a CNN learn to recognize basic image elements, such as borders, corners, round formats, basic geometric shapes and colors.

Representations from filter in the initial layers of a CNN

When we use pre-trained networks to train new categories, those initial layers stay almost unchanged, because they represent basic elements present in every type of image. Training is much easier in this way.

In the next section, this article will show how it works in a pratical example.

Creating a pet classifier using pre-trained networks

To show that a neural network can be trained quickier with pre-trained weights, let's create a model capable of differentiate images from dogs, cats and birds. And, to stay way from bias, we will use the Open Images dataset to train our CNN (pre-trained with Imagenet).

Let's create a classifier that differentiate these 3 classes

The code can be accessed in this GitHub repo: https://github.com/adrianodennanni/pre-trained-nn-benchmark

In first place, we download all images from the 3 categories to distinct directories, divided in train, validation and test. For each one of the categories, we were able to get the following numbers of images:

  • Train:
    53137 files in directory ./train/cat
    89369 files in directory ./train/dog
    105962 files in directory ./train/bird
  • Validation:
    303 files in directory ./validation/cat
    1480 files in directory ./validation/dog
    1052 files in directory ./validation/bird
  • Test:
    907 files in directory ./test/cat
    4491 files in directory ./test/dog
    3222 files in directory ./test/bird

Even though we have few validation images, it is enough to show the differences in the pre-trained networks.

After obtaining all images, we can start to develop our classifier. It will be based on the Xception architecture, which is very efficient.

We will use Keras to develop the model, since it make available the weights from a pre-trained Xception CNN. To declare the model, just follow the code:

import tensorflow as tf
# To use the pre-trained network, weights should be 'imagenet'
# To use the random-weight network, weights should be None
weights = 'imagenet'
# 3 classes on our classifier
n_classes = 3
# Dimensions from the image after being resized
shape = [100, 100, 3]
trained_model = tf.keras.applications.xception.Xception(
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(n_classes, activation='softmax'))

Very simple, don't you agree? The last added layer was Dense, so we could return the scores from the 3 classes we wish to classify. The include_top parameter exists so we can remove the first layer on the network, originally used to train 299x299 images.

After training both models for 30 epochs using the Adam optimizer, the following results were obtained:

Train dataset

Accuracy rate comparison for the pet classifier (train dataset)
Loss value comparison for the pet classifier (train dataset)

Validation Dataset

Accuracy rate comparison for the pet classifier (validation dataset)
Loss value comparison for the pet classifier (validation dataset)

Test dataset

  • Pre-trained:
    Accuracy: 0.9596
    Loss: 0.1184
  • Random weights:
    Accuracy: 0.9366
    Loss: 0.2408

Conclusions about the pet classifier

The pre-trained network went stable after 8 epochs. The random-init network took 17 epochs to do the same thing. It is an excelent example on how pre-trained networks speed up the training on new models. We also can conclude that the pre-trained network also got a better accuracy in the end.

In the train dataset, both methods ended up overfitting the model.

However, we can see that the pre-trained model overfitted faster than the random-init model. To avoid this, you can use techniques such as early stopping.

In conclusion, the pre-trained network ended up havinga boost of 2.3% in accuracy, compared with the CNN with random weights (train dataset).

Creating a CAPTCHA solver with pre-trained CNNs

The last example showed us how we can speed up CNN training when using pre-trained networks, over the ImageNet dataset. We also learned that training is faster in a pre-trained network due to the small changes initial layers of the CNN.

Since ImageNet dataset is made only of real-world photographies, we can wonder if pre-trained CNNs will help train a model with a very different kind of images, such as CAPTCHAs.

Real images are are made of more "natural" features when compared to CAPTCHAs

Let's test this hypothesis. First, we generate lots of CAPTCHAs to feed our new model. We can use the following quantity of CAPTCHAs:

  • Train dataset: 200,000 examples
  • Validation dataset: 5,000 examples
  • Test dataset: 5,000 examples

Again we use Keras fot this test. This time we will use a multi-task classifier to recognize each CAPTCHA character (6 characters, 36 possibilities in each), using a pre-trained Xception CNN as core.

import tensorflow as tf
# To use the pre-trained network, weights should be 'imagenet'
# To use the random-weight network, weights should be None
weights = 'imagenet'
# 36 classes on our classifier
n_classes = 36
# Dimensions from the image after being resize
shape = [160, 80, 3]
trained_model = tf.keras.applications.xception.Xception(
c1 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
c2 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
c3 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
c4 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
c5 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
c6 = tf.keras.layers.Dense(n_classes, activation='softmax')(trained_model.output)
model = tf.keras.Model(inputs=trained_model.input, outputs=[c1, c2, c3, c4, c5, c6])

This model has 6 branches, one for each CAPTCHA character. Keras is able to calculate the loss function automatically in this case.

After training the model for 30 epochs with the Adam optimizer (both pre-trained and random-init models), we got the following results:

Train dataset

Accuracy rate comparison for the CAPTCHA solver(train dataset)
Loss value comparison for the CAPTCHA solver (train dataset)

Validation dataset

Accuracy rate comparison for the CAPTCHA solver (validation dataset)
Loss value comparison for the CAPTCHA solver (validation dataset)

Test dataset

  • Pre-trained:
    Accuracy: 0.9982
    Loss: 0.0479
  • Random weights:
    Accuracy: 0.9496
    Loss: 0.4323

Conclusions over the CAPTCHA solver

The use of a pre-trained neural network not only speeds up training, but also achieves a better result in few epochs.

However, it’s worth to remember that the difference between the convergence speed in this case is lower when compared to the pet classifier example. This is caused by the difference between the two image distribution, as explained earlier.

Final toughts

Using pre-trained neural networks can help your models to achieve better results in less time. It's not necessary to reinvent the wheel every time we want to make a image classifier.