Personalization strategy and recommendation systems at Décathlon Canada

Guillaume SIMO
Feb 21, 2020 · 15 min read
Source: Decathlon media

Following the stories from my great colleague Sam from Décathlon Canada about image classifier (transfer learning, model training and model optimization), visual search and recommendations using images, today we will talk a bit more about personalization and recommendation systems.

In this post, you will learn what personalization is, how it can boost your business, and how to build your first recommendation system.

The plan will be as follows:

I — General presentation of personalization

II — Where can we do personalization?

III — How can you build a recommendation system?

I- General presentation of personalization

Personalization means providing each user with a digital experience reflecting his specific tastes and preferences. At Decathlon, it means recommending products, sports or sport activities which match the tastes of our users. It can also be used to personalize the results of a search bar, the design of a webpage, the content pushed through communication and marketing tools or to have a conversational chatbot.

Explained in another way, we can say that a recommendation system is like a librarian. When you enter a library, you have a huge amount of books. As such, the question becomes: how do I choose the book I would like today? Given some data about yourself (explaining your tastes and what you would like to read), the librarian is able to suggest one or many books you may like. Here, the recommendation system is the librarian and the items to recommend are the books. Building a recommendation system is like translating this suggestion process into an algorithm, able to manage the content of your digital product at scale and in near real-time.

Recommendation systems are powerful because (1) they make life easier and more enjoyable for your users, by finding quickly relevant items and (2) they can influence user behavior, by suggesting an item they would not have thought of looking for.

Capitalize on your data sources using AI to build powerful personalization systems. Source: Décathlon Canada

As a consequence, personalization tends to turn new customers into returning ones, increasing the loyalty to your brand. As a rule of thumb, it is generally accepted that personalization can increase by about 10% the digital turnover of your marketplace. 💰

Personalization can increase digital turnover. Source: cliplart.email

II — Where can we do personalization?

A first and big use case for recommendation systems is to improve browsing experience on your Website.

Because Websites contain a huge amount of information, the role of the data scientist is to design algorithms helping the users find quickly and easily the content they are interested in (if you want to know more about the Data Scientist role at Decathlon, you can have a look at this awesome video here!).

Recommendation is especially powerful for digital marketplaces such as www.decathlon.ca, which proposes more than 10'000 different products.

Here’s a quick overview of the locations on our Website where you can see our recommendation systems in action.

On the homepage, we consider two types of users: identified and unidentified. If you are unidentified (for instance, it is your first visit on our Website), we recommend products based on popularity, stock availability and diversity considerations.

Suggested products on the homepage of decathlon.ca. Source: decathlon.ca

If you are an identified user, we use historical browsing experience to show products matching your interests.

Personalized product suggestions on decathlon.ca. Source: decathlon.ca

The sports displayed on the homepage and the products below the sport banners are updated daily. Similarly, the choice is made based on popularity or historical data. These banners can be personalized, depending on whether the user is identified or not.

Suggested sports with popular related products. Source: decathlon.ca

We also have the opportunity to influence the results of the search bar. On decathlon.ca, we rank products equally relevant to the query based on a popularity index. This popularity index is updated daily.

Search results ranked by popularity. Source: decathlon.ca

Product pages offer multiple opportunities for recommendation systems. We currently use a simple (but very effective!) cross-selling algorithm to display products generally sold within the same basket. Personalization of this algorithm is also on the roadmap, to further reflect user tastes.

Cross-selling products to the ULTRA 900 CF ROAD BIKE. Source: decathlon.ca

You can also develop similarity metrics between the products (based on the product image or textual description) to offer relevant alternatives to the product the user is looking at.

Similar products to the ULTRA 900 CF ROAD BIKE. Source: decathlon.ca

The add to cart section is the final opportunity to communicate with the user before the transaction is completed. As such, you can use basket completion models to recommend additional products which may be of interest to the user.

Add to cart view. Source: decathlon.ca

We use the same recommendation systems to personalize the newsfeed of our mobile apps (for example, Decathlon mobile app or Decathlon Community mobile app). By pushing the right notification, at the right user and at the proper time, users will be more engaged to your platform!

Personalized news feed of the Decathlon Mobile App. Source: Decathlon mobile app

Emails still represent a major communication channel with users. Again capitalizing on the same recommendation systems, we send personalized post purchase emails, to recommend relevant products to the users and one sport activity where the user could use his newly bought product.

Personalized email with products and activity. Source: Décathlon Canada

But you can’t go straight forward from a non-personalized platform to a fully personalized one.

At Décathlon Canada, we have defined three levels of personalization:

  • Smart by default: making sure that, by default, all sections of your platform show what is currently the most popular content (products, sports and sport activities). The advantage is that you don’t need a lot of data for this step. It isn’t computationally expensive and it is very easy and fast to deploy in production. The disadvantage is that you are not using the full potential of your data and the power of AI to create value for your company.

In this case, everyone sees the same content.

  • Contextualization: content is decided according to geolocation and manually-defined segments… It’s a step more powerful to the users than the most popular approach as you are leveraging some user information, but you’re still not at the level of providing individual recommendations.

Each user at the same location or in the same segment sees the same content.

  • One-to-one personalization: the most relevant items per individual user according to user-item interactions (purchase and navigational history, participations in sport activities, user profile, …). There you can use the entire power of your data to enable a great experience corresponding to user tastes and what they like on your platform.

Each individual user sees its own and unique content! 🚀

Let’s see some particular recommendation systems.

III — How can you build a recommendation system?

We compared the performance of five different models to compute personalized product recommendations. Each begins with the same datasets, and they are compared using precision, recall and coverage performance metrics. Let’s first describe the data we have used, then summarize these five recommendation systems, and compare them based on these different metrics.

  1. Data
  2. Random model
  3. Popular model
  4. Item-based model (off-line)
  5. Matrix factorization model
  6. RNNs with LSTM and Self Attention Layers (on-line)
  7. Performance metrics
  8. Results
  9. Conclusion

One significant source of information for recommendation is browsing history. What we are most interested in are user-item interactions, for instance when a user has clicked on or bought an item. This data can be represented by a table containing a user column, an item column and a timestamp column, which describes when the interaction happened.

It is common in machine learning to split the data in a training set (to train the model), a validation set (to optimize the hyperparameters) and a test set (to evaluate the performance of the model, this set is never used during the training and optimization parts).

To compare the different approaches, we used a sample of browsing history data from our e-commerce website (a few millions interactions).

We kept the most recent 10% of the interactions to evaluate the quality of the recommendations given by our models.

User-item interactions. Source: Personalize your app or Website using your catalog of images

The most basic recommendation system is the random one. The random model consists in selecting randomly the 5 items (from the complete catalog) that we recommend to our users.

This is obviously a stupid way of making recommendations, but it is a very interesting baseline to which we can compare more advanced machine learning models and better evaluate the improvement that they generate.

Another interesting baseline is the most popular approach. In this approach, we recommend to the users the 5 most popular items of the catalog (items most often clicked on or bought over the last few days).

This baseline is very simple to implement, but often represents a quick way to make your platform efficient and dynamic.

We chose the offline item-based approach as our first real personalized recommendation system. By off-line, we mean that each night, we loop over all our users, and compute the 5 items we should recommend to them if they visit our platform the following day. When the computations are done, the recommendations are uploaded in a postgres database, and fed to the Website when a user connects himself through a Rest API.

In an item-based recommendation system, we assume that users will mostly be interested by items similar to the one they have interacted with in the past. This similarity metric is computed from the cosine distance on vectors of interactions between users and items.

Cosine similarity between A and B. Source: Wikipedia

Let’s take the user-item interactions matrix below where the rows represent the users and the columns represent the items. A value of 1 means that a user has expressed his interest and a value of 0 means the absence of interaction.

For instance, given the matrix below, we can interpret the first column (which corresponds to the first item) as follows: only the fourth user has interacted in the past with this item.

User-item interaction matrix

The cosine similarity between the first (a =[0 0 0 1]) and the second (b=[0 1 0 0]) column would be zero because none of the users has interacted with both items.

However, if we look at the similarity between the third (a=[1 0 1 0]) and the fifth (b=[1 0 1 1]) columns, the cosine similarity would be of 0.82.

A cosine similarity close to 1 indicates a high level of similarity. In this example, the third item and the fifth item are thus very similar!

Visual representation of an item-based collaborative filtering. Source: Medium

After having computed these pairwise similarities, we look at all the items that a user has recently interacted with. We compute the similarities between these items and the others in the catalog as described above, and we sum the results to find the top 5 items we should recommend to each user.

Aggregation of item similarities. Source: Personalize your app or Website using your catalog of images

In this toy example above, the five items that would be recommended are items 7, 11, 22, 12 and 1, given their higher similarity scores.

Well known since the Netflix prize in 2009 with the Alternating Least Squares algorithm, matrix factorization models are very common in recommendation, and as such are always worth a try.

The goal is to decompose the user-item interaction matrix (the same we saw in the item-based approach) into a product of two lower dimensionality matrices, describing the properties of the users and of the items (for more details, see [1]).

Visual representation of the matrix-factorization model. Source: Towardsdatascience

As such, the matrix factorization approach can be seen as an extension of the item-based one, where we consider both similarity between the items and between the users.

There are popular Python libraries to quickly build matrix factorization models (but also have other algorithms, there are not only limited to matrix factorization) such as Scikit-surprise, Implicit or LightFM.

If you are new to machine learning or you want to learn how to use Tensorflow and Keras with recommendation systems you could try to start with these blog posts about beer recommendation using Keras.

With Keras you can build a matrix factorization model in a few steps by doing a Dot product between a user embedding and an item embedding layer.

Matrix Factorization model with Embeddings in Keras — Medium

The main drawback of matrix factorization is that you have to retrain from scratch your model each time there is a new user or a new user-item interaction to take it into account.

During the last few years, deep neural networks have become very popular. Let’s see how to build a deep recommendation system!

Navigation on a Website can be seen as a sequence of items a user has interacted with. This temporal consideration makes recurrent neural networks (used a lot for time series to predict stocks, sales, or futur turnover and in the field of natural language processing) a sound alternative to process this kind of data.

With recurrent neural networks (RNN), we actually do on-line recommendations. By on-line, we mean that there is no need to precompute daily recommendations for all users — recommendations are computed on the fly, considering the latest items a user has interacted with. This is achieved by saving the Tensorflow RNN model in .pb format, and deploying thanks to Tensorflow serving as described in this article.

The structure of the RNN we use is as follows:

RNN architecture

The format of the input data is a sequence of discrete item identifiers represented as one-hot encoded vectors, to be fed to the embedding layer.

An embedding is a low dimensional representation of a discrete variable (here the variable is a user or an item). It’s kind of similar to the lower dimensionality matrices that we saw in the matrix factorization approach.

A LSTM (Long-Short Term Memory) is a type of RNN. RNNs are designed to deal with temporal dependencies, but a basic RNN tends to only consider most recent items in the sequence. A LSTM, through its input gate, output gate and a forget gate, has a much greater ability to carry information over time, and make sure that all the historical information relevant to a user is considered in the recommendation. You can find here a nice overview of LSTMs (or for deeper details see [2]).

LSTM layer. Source: bhrnjica.net

Dropout is a regularization technique to reduce overfitting (learn the statistical noise within your training data, such that you perform poorly on out-of-sample predictions) by randomly dropping out neurons during training (for more details see [3]).

Dropout. Source: Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Attention is a mapping from a query and a key-value pair to the output. It helps the model to identify regions in the historical user-item sequences that we should focus on, to improve the recommendation. Just like we discussed in the comparison between LSTMs and basic RNNs, it represents another way to better carry information over time, if that information is relevant. There are many good articles which go in depth into the topic of attention in the context of deep learning, such as this one (or the original paper [4]).

Attention layer. Source: Attention Is All You Need

The Dense layer is a typical feed-forward neural network, and finally the Softmax is a feed-forward layer which computes the probability, for each item, to be the next one the user will interact with. By sorting the items based on these probabilities, we extract the top 5 recommendations for each user.

How can we decide which of these five recommendation systems works best for our data?

As explained in a previous blog post, the quality of a recommendation system is generally assessed from three main metrics: precision, recall and coverage. Precision is the fraction of recommended items (based on our training set) that our user has indeed interacted with (as seen in the test set). Recall is to some extent the opposite: it is the fraction of items that users have interacted with that can be found back in the recommendation. Finally, coverage represents the percentage of the items in our catalog which has been recommended to at least one user. For all of these metrics, the higher the better for our recommendation system!

We have computed these metrics for the five approaches discussed above.

The random model yielded a very good coverage (91%), since it does not discriminate between the items of the catalog, but it of course led to poor precision and recall (0.058% and 0.065%).

The most popular approach already significantly improves precision (1.5% vs 0.058%) and recall (1.8% vs 0.065%), but has a very low coverage (0.072%), given that we recommend the same 5 items for every user.

The item-based model is better than the most popular on precision (3.4% vs 1.5%), recall (4.1% vs 1.8%) and coverage (74% vs 0.074%). It is a simple personalization approach, but it can be pretty powerful!

Matrix factorization led to an additional improvement in precision (3.9% vs 3.4%) and recall (5.3% vs 4.1%), given that it uses more information to perform recommendation (both item similarity and user similarity). However, it is a lot more computationally expensive than the item-based approach, and it needs some hyperparameters tuning (to find the proper latent space dimension) to achieve good results.

Finally, RNNs gave us the highest precision (4% vs 3.9%) and recall (5.7% vs 5.3%), while providing sufficient coverage (57%). The strong results support that the order in which the user interacts with items (which is considered in the RNN approach, but not in the matrix factorization and item-based ones) provides relevant information to improve recommendation. Furthermore, the RNN approach is well suited for on-line recommendations, and the model doesn’t need to be retrained every time there is a new interaction or a new user in the dataset.

Performance metrics of the five recommendation systems tested

Personalization can provide significant benefits to your company. While tackling this challenge, we learned how important it is to do so in a step-by-step manner! Firstly, make sure that you have all your data available in the proper format. Then, don’t forget to build yourself baselines (random and most popular recommendations), to better understand the benefits of your more advanced approaches. Spend a lot of energy on hyperparameters tuning (it makes a big difference!), and of course consider your ability to launch the recommendation system in production when selecting the algorithm.

In our case, RNNs represented a sound choice to perform recommendations. In the future, we plan to further improve our models by adding greater availability and diversity factors, and capture new user information using curiosity considerations.

References

[1] Zhou, Yunhong & Wilkinson, Dennis & Schreiber, Robert & Pan, Rong. (2008). Large-Scale Parallel Collaborative Filtering for the Netflix Prize. 337–348. 10.1007/978–3–540–68880–8_32.

[2] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735–1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735

[3] Srivastava, Nitish & Hinton, Geoffrey & Krizhevsky, Alex & Sutskever, Ilya & Salakhutdinov, Ruslan. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research. 15. 1929–1958.

[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

We are hiring!

Are you interested in the application of personalization and AI to improve sport accessibility and user experience? Luckily for you, we are hiring! Follow https://developers.decathlon.com/careers to see the different exciting opportunities.

Thanks to Samuel Mercier, Yan Gobeil and the team from Décathlon Canada, for the comments and review.

Decathlon Technology

Empowering The Sport Tech Community