Building a RNN Recommendation Engine with TensorFlow

Alfonso CARTA
Feb 17, 2021 · 9 min read

Recommendation engines are powerful tools that make browsing content easier. Moreover, a great recommendation system helps users find things they wouldn’t have thought to look for on their own. For these reasons, recommendation tools can dramatically boost e-commerce turnover. Here we show how we — at Decathlon France — implemented a RNN (recurrent neural network) recommendation system that outperforms our previous model (ALS, alternating least square) by more than 20% on Precision metric (off-line).

We suggest you the great article of our Canadian colleagues if you are looking for a more comprehensive introduction on recommendation systems.

Business Problem

Decathlon online catalog is quite large, we sell tens of thousands of different items on our online platform. Such a wide choice can be an obstacle for users looking for new compelling new content. Of course customers can use search to access content. To this regard, a recommendation engine can come in handy, since it can display items that users might not have thought to search for on their own, and so hugely improve customer experience and conversion rate.

With the aim of improving our previous solution (ALS), we developed a new recommendation engine (Deep Learning based) to personalize the homepage of for each identified customer. Hence, every identified user sees different and personalized recommendations.

Data Science Solution

Our previous recommender system is a Collaborative filtering implemented with library. The package uses the alternating least squares (ALS) algorithm to learn the latent factors of a user-item association matrix.

Collaborative filtering is commonly used for recommender systems. However, it turns out to be a rigid framework preventing the use of any other useful metadata. Moreover, it assumes that users and items are only linearly depended. More recent works (Netflix, YouTube, Spotify) have shown that deep learning frameworks outperform classical factorization methods thanks to the use of more rich non-linear transformations of the input data as well as the addition of useful metadata.

Our deep learning approach is inspired by recent text generation techniques that use RNN models capable of generating subsequent words from a given phrase (sequence of words). For our recommendation problem, we replaced sequence of words with sequence of items and applied the same methodology (the figure below depicts the NLP and Recommendation analogy).

NLP and Recommandation analogy to predict the next more likely word and item respectively

Input Data

The data used for developing our recommendation engine consist of temporal ordered sequences of bought items and recency (of purchased items) sequences for each identified customer. Here is an example of our input data:

Input data sample


  • : is the customer unique identifier;
  • : strings containing sequences of bought item ids for each customer;
  • : strings containing sequences of number of days passed from item purchase and last training date (kind of purchase recency for each bought item).

NB: and sequences are temporally ordered.

Feature Transformation

Features and need be appropriately transformed before feeding them to the model. Namely, will be tokenized and encoded whereas will be bucketed.

Tokenization & Encoding of

In order to prepare the input data for the RNN model, each sequence of bought items is tokenized and each item in the sequence is integer-encoded.

This is achieved by using the class from which allows to vectorize a text corpus, by turning each text into a sequence of integers (check out the doc for more details). For example, the above strings of item sequences become the following arrays ():

  • ->
  • -> .

Bucketing of

The feature contains variable length sequences of integer values ranging from 365 to 1. This because we extracted one year transaction history and so 365 represent the oldest time (one year old expressed in days) for a purchased item and 1 the most recent one.

We decided to bucket into 100 bins (this value can be changed if needed), denoted by discrete values ranging from 1 to 100. For example, the above strings of recency sequences become the following arrays ():

  • ->
  • -> .

This transformation can easily implemented with .

Create Input and Target features tensor

In order to train the model, we need to create a tensor containing the input and target features.

Notably, the input features are the transformed version of and without the last element, whereas the target feature is the shifted version of (without the first element).

This is because we want to make the RNN model learn to predict the next item of the target sequence given the previous input features.

For example, if the transformed feature of , are and respectively, then the feature tensor will be:

'item_id': [74, 276, 362],
'nb_days': [96, 96, 89],
'target': [276, 362, 119]


In order to have all sequences of the same length, input and target features are then padded via the function from . For our purpose we found that fixing the maximal length of each sequence to 20 (, part of the program configuration) worked fine. This value corresponds to 80th percentile of sequences’ length, i.e. 80% of sequences are made up of at most 20 items.

For example, with the feature tensor of the above example would become:

'item_id': [0, 0, 74, 276, 362],
'nb_days': [0, 0, 96, 96, 89],
'target': [0, 0, 276, 362, 119]

Feature Data

The final feature data is a dictionary whose keys are , , and values are numpy arrays where denotes the number of sequences used for training (at most the number of users) and is the maximal length used for padding.

train_dict = {
'item_id': [[0, 0,...,0, 74, 276, 362], ...]
'nb_days': [[0, 0,...,0, 96, 96, 89], ...]
'target': [[0, 0,...0, 276, 362, 119], ...]

Eventually, we used the function to create a train dataset to be used for the model training (see code below).

TensorFlow implementation of the train dataset creation


The modeling strategy that has been implemented here consists in predicting, for each user, the next most likely bought product, based on the sequence of previous bought items ( feature) and the purchase recency ( feature).

The feature played a pivotal role in drastically improving model performance. In fact, this feature helps the model to better follow the seasonal behavior of purchases and hence recommend more adapted products across time.

Finally, we also added a self-attention layer which has been identified as a performance booster for sequence-to-sequence model architectures, as it is in our case.

The model implemented here is hence an RNN model coupled with a self-attention layer having the following architecture:

Recommendation Engine model architecture

The hyper-parameters of the model are:

  • : each item () will be encoded (embedded) in a dense vector with
  • : each bucketed recency value () will be encoded (embedded) in a dense vector with
  • : drop-out value of the LSTM layer
  • : number of units ("neurons") of the LSTM layer
  • : learning rate for training the model

whereas the (not-tunable) parameters are:

  • : the maximal length of each sequence of items, as seen in the data pre-processing section
  • : the number of unique items + 1 (to take into account the zero padding)
  • : number of samples (sequences) that are fed to the model at the same time for training

Here you can find the TensorFlow implementation of the above recommender architecture:

TensorFlow implementation of the RNN recommendation engine

As Loss function we used a modified version of the , where we skipped loss calculation on the zero padded elements:

TensorFlow implementation of custom loss function

Hyper-Parameter Optimization

Model hyper-parameters have been tuned using the Hyperband optimization algorithm implemented in the library, which allows one to make distributable hyper-parameter optimization for TensorFlow models in few lines of code

Notably, hyper-parameter optimization is carried out using validation dataset which consists of 10% of randomly selected samples from training data. Moreover, we used as metric to rank optimization trials.

We tried several values for the parameter, and eventually ended up using .


The final model is trained on all available data with the best hyper-parameter obtained at the end of the previous optimization phase.

TensorFlow implementation of the fitting method


Before going live with our new recommender system, we adopted a backtesting strategy in order to evaluate the model predicting performance. In fact, backtesting assesses the viability of a model by discovering how it would have played out retrospectively using historical data.

The underlying theory is that any model that worked well in the past is likely to work well in the future, and conversely, any model that performed poorly in the past is likely to perform poorly in the future.

To this aim, we used one year of historical transactions as training data and we tested the model performance on the subsequent month (see picture below).

Backtesting strategy representation

The table below shows the benchmark performance of several models during the backtesting. We used classical metrics employed to rank recommender models’ results, i.e. Precision, Recall, MRR, MAP and Coverage, all of them calculated on the best five recommended items per user.

Benchmark of backtesting results (orange-production, green-challenger, gray-baseline)

The orange line represents the actual production model (ALS) performance, the one we aim to improve. We developed several versions of the RNN recommender system (challengers, green lines) with increasing level of complexity.

The simplest RNN, containing only the as feature, was close to the ALS performances, but inferior on almost all metrics.

The greatest improvement boost was due to the adding of the feature. This temporal feature allowed to the RNN based recommender system to clearly surpass the ALS model on all the given metrics. Finally, the addition of the attention layer ulteriorly improved the RNN overall performances.

For the sake of comparability, we also calculated some (simple) baselines (grey lines in the table) which clearly show the performance gap with the more sophisticated models (ALS and RNN versions).

Online A/B test are ongoing and preliminary results confirm the superior performance of the RNN recommendation engine.


We have described our new deep neural network architecture for recommending items on Decathlon websites and showed how to implement it using TensorFlow. The model — inspired by recent text regeneration techniques — uses a Recurrent Neural Network (RNN) architecture whose inputs are sequences of bought items associated with sequences of purchase recencies (nb_days feature). Since deep learning models require special representation of input features, we also described how to properly transform those input sequences (item_id and nb_days) before feeding them to the model.

Finally, by means of a backtesting strategy we demonstrated how the use of nb_days feature was pivotal in outperforming our historical production model based on a matrix factorization approach (ALS).

This new deep learning architecture opens new perspective by giving us the possibility of adding new features (related to users and items) which can further improve the model performance and user experience.

How about you? Do you have experience in Deep Learning & Recommender systems? Let us know in the comments below!

🙏🏼 If you enjoy reading this article, please consider giving it a few 👏👏👏.

💌 Follow our latest posts on Twitter and LinkedIn and discover our sport tech initiatives like open APIs on our developers website 🚀

👩‍💻? 👨‍💻? Interested in joining Decathlon Tech Team? check out our carriers portal to see the different exciting opportunities !

Very warm thank you to the AI Personalization team and especially to Claire Deltrel & David Degardin who greatly contributed to this project.

Decathlon Technology

Empowering The Sport Tech Community