How you can build an image classifier in one day — Part 1: transfer learning

This post is the first part of a series describing the development of an image classification API using transfer learning. The code for this series can be found on Decathlon Canada Github page.

Source: Allan Handan

What you will learn

In part 1 of this series on image classification, you will learn about:

  • the algorithm (convolutional neural networks) used to classify images;
  • how to capitalize on existing models to build a powerful image classifier by a technique called transfer learning.

Automatic identification of the content of an image, a problem called image classification, is a hot topic in the world of AI. It has been made possible by the development of what we call convolution neural networks: a series of filters which walk over an image to find the specific components (lines, curves, shapes) it contains.

Magic happens when filters are stacked on top of one another — the filters in the bottom layers identify basic components in the image (a line or a curve), while the filters on top identify combinations of these components (two lines or two curves next to each other, for instance). As we increase the number of layers of filters, we can identify increasingly complex patterns in an image, and distinguish similar images (let’s say two models of a car) with surprising accuracy.

Identification of the digit found in an image by a convolutional neural network - the bottom layers find the basic components in the image, while the top layers identify combination of these components. Source: Terence Broad

In other words, building a convolutional neural network is somewhat like doing a reverse jigsaw puzzle — imagine that, instead of beginning with a bunch of pieces and seeing how these pieces fit with one another, you begin with a bunch of uncut puzzles. The objective of the game is to find how to decompose these puzzles (training set of images) into a number of smaller pieces (the filters), in such a way that when you find one of these pieces in a new puzzle, it gives you a tip about which category the puzzle belongs to.

Capitalizing on existing models

A major problem of convolutional neural networks is the number of images that you need during model training to achieve good performance. When we start from scratch, we sometimes need tens of thousands, or even more, images of each category before the neural network can learn the proper filters for this problem. Luckily, there now exists an efficient solution to build an image classifier with a limited number of images: transfer learning!

Transfer learning

Transfer learning is the idea of using a neural network trained for a slightly different application, and tweak this neural network just enough to also work well for your target application.

For instance, here is Google’s Inception-V3 model:

Composition of Google Inception-V3 model

This model is highly complex, and has millions of parameters. It is composed of a series of layers (about 300!) stacked on top of one another. Each layer (even each neuron in each layer) performs a mathematical operation to decompose the image into its components, and identify to which category the image belongs to given the components it contains.

However, you may not be interested in the 1000 categories (dog, cat, car, …) that Google has developed Inception-V3 model for. You may want to, let’s say, build a classifier able to identify automatically the specific retail product found in a picture.

Luckily, you can still leverage models like Inception-V3 to perform your task. The bottom layers of the neural network perform very different tasks than the top layers: the bottom layers decompose the image into its specific components, while the top layers identify the categories that the image belongs to given the components that the bottom layers have found. It turns out that the bottom layers remain nearly as efficient regardless of the specific categories we want to classify — therefore, only the top layers need to be adjusted to build a good image classifier for your application.

In more practical terms, transfer learning means keeping the first layers of a model built for a slightly different application, and only updating the last few layers to create an algorithm tailored to your application.

Building a classifier to indentify the type of flower found in an image — following transfer learning principles, you keep the bottom layers of Inception-V3 model (decomposing the image into its basic components), and you replace the top layers with a new neural network (identifying the type of flower given the components found by the bottom layers). Source: Google codelab

To put it simply, transfer learning is just a smart way to reuse parts of another model to jump-start the development of your custom application :)

Content of the series

The objective of this series is to illustrate how you can quickly build a good image classifier using transfer learning. This part introduced the concepts of image classification, convolutional neural networks and transfer learning. In the following parts of the series, we will learn how to build sets of images to train your algorithm, train an image classifier for your application using transfer learning, optimize the accuracy of the classifier, and build your first AI-powered image classification API.

We are hiring!

Are you interested by transfer learning and the application of AI to improve sport accessibility? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities. Otherwise, see you in part 2!

A special thanks to Gabriel Poulin-Lamarre, from D-Wave Quantum Computing, and Caio Bianchi, from Décathlon Canada, for the thorough review of the article.