Building a RNN Recommendation Engine with TensorFlow

Published in

Decathlon Digital

9 min readFeb 17, 2021

Recommendation engines are powerful tools that make browsing content easier. Moreover, a great recommendation system helps users find things they wouldn’t have thought to look for on their own. For these reasons, recommendation tools can dramatically boost e-commerce turnover. Here we show how we — at Decathlon France — implemented a RNN (recurrent neural network) recommendation system that outperforms our previous model (ALS, alternating least square) by more than 20% on Precision metric (off-line).

We suggest you the great article of our Canadian colleagues if you are looking for a more comprehensive introduction on recommendation systems.

Business Problem

Decathlon online catalog is quite large, we sell tens of thousands of different items on our online platform. Such a wide choice can be an obstacle for users looking for new compelling new content. Of course customers can use search to access content. To this regard, a recommendation engine can come in handy, since it can display items that users might not have thought to search for on their own, and so hugely improve customer experience and conversion rate.

With the aim of improving our previous solution (ALS), we developed a new recommendation engine (Deep Learning based) to personalize the homepage of www.decathlon.fr for each identified customer. Hence, every identified user sees different and personalized recommendations.

Data Science Solution

Our previous recommender system is a Collaborative filtering implemented with spark.ml library. The spark.ml package uses the alternating least squares (ALS) algorithm to learn the latent factors of a user-item association matrix.

Collaborative filtering is commonly used for recommender systems. However, it turns out to be a rigid framework preventing the use of any other useful metadata. Moreover, it assumes that users and items are only linearly depended. More recent works (Netflix, YouTube, Spotify) have shown that deep learning frameworks outperform classical factorization methods thanks to the use of more rich non-linear transformations of the input data as well as the addition of useful metadata.

Our deep learning approach is inspired by recent text generation techniques that use RNN models capable of generating subsequent words from a given phrase (sequence of words). For our recommendation problem, we replaced sequence of words with sequence of items and applied the same methodology (the figure below depicts the NLP and Recommendation analogy).

NLP and Recommandation analogy to predict the next more likely word and item respectively

Input Data

The data used for developing our recommendation engine consist of temporal ordered sequences of bought items and recency (of purchased items) sequences for each identified customer. Here is an example of our input data:

Where:

customer_id : is the customer unique identifier;
item_id : strings containing sequences of bought item ids for each customer;
nb_days: strings containing sequences of number of days passed from item purchase and last training date (kind of purchase recency for each bought item).

NB: item_id and nb_days sequences are temporally ordered.

Feature Transformation

Features item_id and nb_days need be appropriately transformed before feeding them to the model. Namely, item_id will be tokenized and encoded whereas nb_days will be bucketed.

Tokenization & Encoding of item_id

In order to prepare the input data for the RNN model, each sequence of bought items is tokenized and each item in the sequence is integer-encoded.

This is achieved by using the Tokenizer class from tensorflow.keras.preprocessing.text which allows to vectorize a text corpus, by turning each text into a sequence of integers (check out the doc for more details). For example, the above strings of item sequences become the following arrays (numpy):

"15431,164271,300287,300358" -> [74, 276, 362, 119]
"1733,11687,15623,103080,1733,310789" -> [1, 2, 33, 289, 1, 27].

Bucketing of nb_days

The nb_days feature contains variable length sequences of integer values ranging from 365 to 1. This because we extracted one year transaction history and so 365 represent the oldest time (one year old expressed in days) for a purchased item and 1 the most recent one.

We decided to bucket nb_days into 100 bins (this value can be changed if needed), denoted by discrete values ranging from 1 to 100. For example, the above strings of recency sequences become the following arrays (numpy):

'354,354,327,327' -> [96, 96, 89, 89]
'116,116,116,8,8,1' -> [32, 32, 32, 2, 2, 1].

This transformation can easily implemented with tensorflow.feature_column.bucketized_column.

Create Input and Target features tensor

In order to train the model, we need to create a tensor containing the input and target features.

Notably, the input features are the transformed version of item_id and nb_days without the last element, whereas the target feature is the shifted version of item_id (without the first element).

This is because we want to make the RNN model learn to predict the next item of the target sequence given the previous input features.

For example, if the transformed feature of item_id, nb_days are [74, 276, 362, 119] and [96, 96, 89, 89] respectively, then the feature tensor will be:

{
'item_id': [74, 276, 362],
'nb_days': [96, 96, 89],
'target': [276, 362, 119]
}

Padding

In order to have all sequences of the same length, input and target features are then padded via the pad_sequences() function from tensorflow.keras.preprocessing.sequence. For our purpose we found that fixing the maximal length of each sequence to 20 (max_len, part of the program configuration) worked fine. This value corresponds to 80th percentile of sequences’ length, i.e. 80% of item_id sequences are made up of at most 20 items.

For example, with max_len = 5 the feature tensor of the above example would become:

{
'item_id': [0, 0, 74, 276, 362],
'nb_days': [0, 0, 96, 96, 89],
'target': [0, 0, 276, 362, 119]
}

Feature Data

The final feature data is a dictionary whose keys are item_id, nb_days, target and values are N x max_length numpy arrays where N denotes the number of sequences used for training (at most the number of users) and max_length is the maximal length used for padding.

train_dict = {
              'item_id': [[0, 0,...,0, 74, 276, 362], ...]
              'nb_days': [[0, 0,...,0, 96, 96, 89], ...]
              'target':  [[0, 0,...0, 276, 362, 119], ...]
             }

Eventually, we used the tf.data.Dataset.from_tensor_slices function to create a train dataset to be used for the model training (see code below).

TensorFlow implementation of the train dataset creation

Model

The modeling strategy that has been implemented here consists in predicting, for each user, the next most likely bought product, based on the sequence of previous bought items (item_id feature) and the purchase recency (nb_days feature).

The nb_days feature played a pivotal role in drastically improving model performance. In fact, this feature helps the model to better follow the seasonal behavior of purchases and hence recommend more adapted products across time.

Finally, we also added a self-attention layer which has been identified as a performance booster for sequence-to-sequence model architectures, as it is in our case.

The model implemented here is hence an RNN model coupled with a self-attention layer having the following architecture:

Recommendation Engine model architecture

The hyper-parameters of the model are:

emb_item_id_units: each item (item_id) will be encoded (embedded) in a dense vector with shape = (1, emb_item_id_units)
emb_nb_days_units: each bucketed recency value (nb_days) will be encoded (embedded) in a dense vector with shape = (1, emb_nb_days_units)
recurrent_dropout: drop-out value of the LSTM layer
rnn_units: number of units ("neurons") of the LSTM layer
learning_rate: learning rate for training the model

whereas the (not-tunable) parameters are:

max_length: the maximal length of each sequence of items, as seen in the data pre-processing section
item_id_vocab_size: the number of unique items + 1 (to take into account the zero padding)
batch_size: number of samples (sequences) that are fed to the model at the same time for training

Here you can find the TensorFlow implementation of the above recommender architecture:

TensorFlow implementation of the RNN recommendation engine

As Loss function we used a modified version of the tf.keras.losses.sparse_categorical_crossentropy, where we skipped loss calculation on the zero padded elements:

TensorFlow implementation of custom loss function

Hyper-Parameter Optimization

Model hyper-parameters have been tuned using the Hyperband optimization algorithm implemented in the keras-tuner library, which allows one to make distributable hyper-parameter optimization for TensorFlow models in few lines of code

Notably, hyper-parameter optimization is carried out using validation dataset which consists of 10% of randomly selected samples from training data. Moreover, we used sparse_categorical_accuracy as metric to rank optimization trials.

We tried several values for the batch_size parameter, and eventually ended up using batch_size=512.

Training

The final model is trained on all available data with the best hyper-parameter obtained at the end of the previous optimization phase.

TensorFlow implementation of the fitting method

Results

Before going live with our new recommender system, we adopted a backtesting strategy in order to evaluate the model predicting performance. In fact, backtesting assesses the viability of a model by discovering how it would have played out retrospectively using historical data.

The underlying theory is that any model that worked well in the past is likely to work well in the future, and conversely, any model that performed poorly in the past is likely to perform poorly in the future.

To this aim, we used one year of historical transactions as training data and we tested the model performance on the subsequent month (see picture below).

The table below shows the benchmark performance of several models during the backtesting. We used classical metrics employed to rank recommender models’ results, i.e. Precision, Recall, MRR, MAP and Coverage, all of them calculated on the best five recommended items per user.

Benchmark of backtesting results (orange-production, green-challenger, gray-baseline)

The orange line represents the actual production model (ALS) performance, the one we aim to improve. We developed several versions of the RNN recommender system (challengers, green lines) with increasing level of complexity.

The simplest RNN, containing only the item_id as feature, was close to the ALS performances, but inferior on almost all metrics.

The greatest improvement boost was due to the adding of the nb_days feature. This temporal feature allowed to the RNN based recommender system to clearly surpass the ALS model on all the given metrics. Finally, the addition of the attention layer ulteriorly improved the RNN overall performances.

For the sake of comparability, we also calculated some (simple) baselines (grey lines in the table) which clearly show the performance gap with the more sophisticated models (ALS and RNN versions).

Online A/B test are ongoing and preliminary results confirm the superior performance of the RNN recommendation engine.

Conclusions

We have described our new deep neural network architecture for recommending items on Decathlon websites and showed how to implement it using TensorFlow. The model — inspired by recent text regeneration techniques — uses a Recurrent Neural Network (RNN) architecture whose inputs are sequences of bought items associated with sequences of purchase recencies (nb_days feature). Since deep learning models require special representation of input features, we also described how to properly transform those input sequences (item_id and nb_days) before feeding them to the model.

Finally, by means of a backtesting strategy we demonstrated how the use of nb_days feature was pivotal in outperforming our historical production model based on a matrix factorization approach (ALS).

This new deep learning architecture opens new perspective by giving us the possibility of adding new features (related to users and items) which can further improve the model performance and user experience.

How about you? Do you have experience in Deep Learning & Recommender systems? Let us know in the comments below!

🙏🏼 If you enjoy reading this article, please consider giving it a few 👏👏👏.

💌 Follow our latest posts on Twitter and LinkedIn and discover our sport tech initiatives like open APIs on our developers website 🚀

👩‍💻? 👨‍💻? Interested in joining Decathlon Tech Team? check out our carriers portal to see the different exciting opportunities !

Very warm thank you to the AI Personalization team and especially to Claire Deltrel & David Degardin who greatly contributed to this project.