Building a visual search algorithm in a few steps using transfer learning

Published in

Decathlon Digital

5 min readJan 28, 2019

In our recent blog series on transfer learning, we investigated the application of transfer learning to the classification of images. More specifically, we learnt that transfer learning means that we start from a model built for a slightly different application, and adjust it to become efficient for our specific problem. We also showed that using transfer learning and the image-classification library developed by Decathlon Canada, we can build in a few steps a classifier able to distinguish 30 different pieces of hockey equipment with an accuracy above 90%.

But obviously, there are multiple other applications of transfer learning. In this article, we will look at a very interesting one: visual search.

Let’s say you are a retailer or have an online marketplace, with a dataset of images describing your products. Chances are that you have a user who saw the image of a product sold elsewhere, and would like to find the products in your catalog which are similar. Maybe you also have a user who took a picture of his used product, and would like to find the best replacement you can offer. Or, perhaps you have a user looking for inspiration, beginning with the picture of a product he likes.

These new ways of looking for products can be achieved by building a visual search engine, similar to https://images.google.com/. In this post, we will build a powerful visual search algorithm in a few simple steps, and see how it performs to search for hockey products in Decathlon’s catalog.

How it works

An illustration of transfer learning. Source: Brian Curry

Going back to part 1 of our blog series on image classification, we remember that a model, such as Inception_V3, is composed of two parts. The first finds the components (lines, curves, shapes) an image contains, and the second part identifies what the image shows given the components it contains.

If we keep only the first part of the model, it outputs a vector, defining the components that the image contain. If two images describe a similar product, chances are that these two vectors will be similar to one another.

To calculate how “similar” two vectors are, we can use a metric called cosine similarity. Let’s say that the first part of Inception_V3 outputs a vector X for one image, and vector Y for a second image. The cosine similarity of these two vectors can be calculated as follows:

#import numpy library
import numpy as np#calculate the dot product of the vectors - that is, the #multiplication of each number in X by each number in Y, followed by #their summation
dot_product = np.dot(X, Y)#normalize the results, to achieve similarity measures independant #of the scale of the vectors
norm_Y = np.linalg.norm(X)
norm_X = np.linalg.norm(Y)
cosine_similarity = dot_product / (norm_X * norm_Y)

As such, the pipeline for building a visual search algorithm is fairly simple: translate the image into a vector using the first part of Inception_V3, calculate the cosine similarity between this vector and those of the images in your catalog, and return the products with the highest similarity. That’s it!

An example for hockey products

In this post, we will build a visual search algorithm for hockey products using the image-similarity library built by Decathlon Canada. To get started, simply clone the library at the desired location:

git clone https://github.com/decathloncanada/image-similarity.git

To build a visual search algorithm for your own catalog, simply place the images of your products in the data/dataset/ directory. These images can have any name, but when you have more than one image of a given product, make sure to name them as {PRODUCT_ID}_{IMAGE_ID}.jpg, where {PRODUCT_ID} is the name or id of your product and {IMAGE_ID} is the name or id of the image associated to that product.

For instance, let’s say that you have the following dataset, named “hockey_products”, in the data/dataset/ directory:

data/
  dataset/
    hockey_products/
      hockeybag1.jpg
      hockeybag2.jpg
      hockeybag3_1.jpg
      hockeybag3_2.jpg

The library will assume that your catalog is composed of three different products (hockeybag1, hockeybag2 and hockeybag3), and that hockeybag3 is associated to two images.

In our case, we placed in the data/dataset/ directory the image of the hockey products sold by Decathlon Canada. This dataset contains a total of 411 images, describing 32 different hockey products.

Our next job is to calculate, for each image in the catalog, the vector describing the components it contains. These vectors can be calculated by running the following call:

python3 main.py --task fit --dataset hockey_products --transfer_model Inception_Resnet

In this call, the — dataset argument is used to indicate the name of the dataset containing the images of your products, and — transfer_model indicates the transfer learning model used to calculate the vectors associated to the images. These vectors are stored in an sqlite database, located in the data/database/ directory.

Now that the vectors have been calculated for all the products in your catalog, you can run a visual search given an image taken by the user with the following call:

python main.py --task visual_search -- img {PATH_TO_THE_IMG} --dataset hockey_products --transfer_model Inception_Resnet

where {PATH_TO_THE_IMG} is the path to the image taken by the user.

For instance, let’s say that a user has taken a picture of his pair of used hockey skates, and is looking for suggestions of similar products in Decathlon’s catalog. Running the visual search call as described above, we get the following result:

Results of the visual search algorithm: the top-left image is the picture of a used product taken by a user, and the following images describe the eight most similar products found in Decathlon’s hockey catalog.

The image of the pair of used skates is at the top left corner, while the following images describe the eight most similar products in the catalog which have been found by the algorithm.

As we can see, the algorithm has sucessfully identified that the hockey skates in the catalog are the products most similar to the image taken by the user. The library also returned first an all-black, white-laces model very similar to the skates of the user.

As we discussed in the introduction, there are many additional applications of image similarity that we can implement. One of these applications is the development of an item-based recommendation system, which will be the topic of our next blog post!

We are hiring!

Are you interested by transfer learning and the application of AI to improve sport accessibility? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities.

A special thanks to Gabriel Poulin-Lamarre, from D-Wave Quantum Computing, and Abdullatif Dalab, from Décathlon Canada, for the comments and review.

Building a visual search algorithm in a few steps using transfer learning

How it works

An example for hockey products

We are hiring!

Written by Samuel MERCIER