Transformers4Rec: A flexible library for Sequential and Session-based recommendation

Gabriel Moreira
NVIDIA Merlin
Published in
8 min readSep 24, 2021

By Gabriel Moreira, Ronay Ak and Sara Rabhi

Recommender systems help users to find relevant content, products, media and much more in online services. They also help such services to connect their long-tailed (unpopular) items to the right people, to keep their users engaged and increase conversion.

Traditional recommendation algorithms, e.g. collaborative filtering, usually ignore the temporal dynamics and the sequence of interactions when trying to model user behaviour. But users’ preferences do change over time. Sequential recommendation algorithms are able to capture sequential patterns in users browsing might help to anticipate the next user interests for better recommendation. For example, users getting started into a new hobby like cooking or cycling might explore products for beginners, and move to more advanced products as they progress over time. They might also completely move to another topic of interest, so that recommending items related to their long past preferences would become irrelevant.

A special case of sequential-recommendation is the session-based recommendation task, where you have only access to the short sequence of interactions within the current session. This is very common in online services like e-commerce, news and media portals, where the user might be brand new or prefer to browse anonymously (and due to GDPR compliance no cookies are collected). This task is also relevant for scenarios where users’ interests change a lot over time depending on the user context or intent, so leveraging the current session interactions is more promising than old interactions to provide relevant recommendations.

Fig. 1 — Distinct interests for different user sessions

To deal with sequential and session-based recommendation, many sequence learning algorithms previously applied in machine learning and NLP research have been explored for Recommender Systems (RecSys), such as k-Nearest Neighbors, Frequent Pattern Mining, Hidden Markov Models, Recurrent Neural Networks, and more recently neural architectures using the Self-Attention Mechanism and the Transformer architectures.

In this post we introduce Transformers4Rec — a flexible and efficient library for state-of-the-art sequential and session-based recommendation. Transformers4Rec embeds learnings from our research and extensive experimentation in this area (check our paper (open-access here) at ACM RecSys’21) and was also used by NVIDIA teams in our solutions that won two recent competitions on session-based recommendation: the WSDM WebTour Workshop Challenge 2021, organized by Booking.com, and the SIGIR eCommerce Workshop Data Challenge 2021, organized by Coveo.

2. Transformers4Rec

Transformers4Rec is an open-source library developed by the NVIDIA Merlin team for building state-of-the-art sequential and session-based recommendation models with PyTorch and TensorFlow.

Fig. 2 — Next-item prediction with Transformers4Rec

2.1. Leveraging cutting-edge NLP research for RecSys

Transformers4Rec was motivated by the observation that over the last decade research on Natural Language Processing (NLP) has inspired RecSys algorithms for sequential and session-based recommendation, as illustrated in Figure 3. Transformers4Rec is designed to work as a bridge between NLP and RecSys fields by its integration with one the most popular NLP frameworks: HuggingFace Transformers.

Fig. 3 — A timeline illustrating the influence of NLP research in RecSys, from Transformers4Rec paper

Transformers4Rec makes state-of-the-art Transformer architectures available for RecSys researchers and industry practitioners, so that they can quickly and easily explore the latest developments of the NLP for sequential and session-based recommendation tasks.

2.2. Transformers4Rec modules

Transformers4Rec library is modular and composed of building blocks which are compatible with vanilla PyTorch modules and TF Keras layers. You can create custom architectures, e.g. with multiple towers, multiple heads/tasks and losses. The main building blocks can be seen in Figure 4 and will be described in the next sections.

Fig. 4 — Transformers4Rec modules

Input module

In general, many features about items metadata and users’ context are available in online services. When combined by recommendation models, those features are able to improve the model ability to find patterns in user behaviour, and provide improved recommendation accuracy.

Transformers4Rec supports multiple input features that can be either user/session level or interaction level (sequential features). The features are defined by a `Schema` object. It contains statistics about features such as cardinality, min and max values, and tags based on their characteristics and types (e.g., categorical, continuous, list, item_id). The Schema can be defined using Python code or by loading from file in protobuf text format (example), as shown in the following snippet. This schema file can either be created manually, or be generated automatically as an output from an NVTabular preprocessing workflow (more on this later).

The `TabularSequenceFeatures` is responsible to process and aggregate all features, and outputs a sequence of interaction embeddings to be fed into transformer blocks. Based on a Schema and options set by the user, it dynamically creates all the necessary layers (e.g. embedding layers) to encode, normalize, and aggregate categorical and continuous features. It also allows us to set the masking training approach (e.g. Causal LM, Masked LM).

Masking module

Transformer architectures can be trained in different ways. There is a specific `masking` depending on the training method. Transformers4Rec currently supports the following training approaches, inspired by NLP: Causal LM (Language Modeling), Masked LM, Permutation LM and Replacement Token Detection. For example, with a Causal LM (or autoregressive) approach the model is trained to predict the next item from the previous ones (items on the left). A different approach is Masked LM (or autoencoding), which randomly masks items in the sequence and uses the remaining items (on the left and right) to predict masked items. For more details on the training approaches please refer to our documentation.

Body module

The TransformerBlock class provides the core integration bridge with HF Transformers. This block feeds the sequence of interaction embeddings to a HF Transformer architecture with the appropriate masking scheme. It takes as input a HF Transformer config object, which we have extended to include a `build()` method with default arguments, like XLNetConfig.

We provide a convenient SequentialBlock, which allows the definition of a model body as a sequence of layers (similarly to torch.nn.sequential). It is designed to define our model as a sequence of layers and automatically setting the input shape of a layer from the output shape of the previous one. In the following snippet, we connect the `TabularSequenceFeatures` to an MLP layer that projects the features to the same dimension as the hidden size of the Transformer layer (`d_model`).

Prediction Heads and Model

We can have different prediction tasks according to the prediction head. NextItemPredictionTask is the class that supports predicting the next item for a user sequence / session. An interesting option available is `weight_tying` (also known as tying embeddings) where the weights of the item id embedding table are shared with the output layer, helping the model to have fewer parameters and learn faster the item embeddings. The library also provides `BinaryClassificationTask` and `RegressionTask` for sequence-level binary classification and regression tasks.

Finally the last one is the Head module, where we link the model body to the prediction tasks to get the final PyTorch Model class.

Check our documentation for more information and examples on building model architectures.

2.3. Training and Evaluation

Transformers4Rec PyTorch API provides a Trainer object (inherited from HF Transformers Trainer) for training models as in the following example. It uses as default the optimized NVTabular dataloader that reads parquet files directly to GPU memory for faster performance. We overloaded the `Trainer.evaluate()` method so that it computes RecSys metrics (e.g., Recall@k, NDCG@k, MAP@k) rather than NLP ones.

3. End-to-end production pipeline

Transformers4Rec is production-ready and has a direction integration with NVIDIA Merlin components to build end-to-end GPU accelerated pipelines for sequential and session-based recommendation, as illustrated in Figure 5.

Fig. 5 — End-to-end sequential and session-based recommendation with NVIDIA Merlin

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. In particular, NVTabular provides support for preprocessing features for sequential and session-based recommendation. It outputs parquet files, which can be efficiently loaded directly to GPU memory by the optimized dataloaders for PyTorch and Tensorflow. NVTabular also outputs the schema of the preprocessed dataset compatible with Transformers4Rec, as we saw before.

​NVIDIA Triton Inference Server (TIS) simplifies the deployment of AI models at scale in production. TIS is a cloud and edge inferencing solution optimized to deploy machine learning models both for GPUs and CPUs and it supports a number of different deep learning frameworks such as TensorFlow and PyTorch.

The full sequential and session-based recommendation pipelines composed by an NVTabular preprocessing workflow and Transformers4Rec trained model can be exported to be served in TIS as a single ensemble model. Currently, deployment of Transformers4Rec models trained with the PyTorch API to TIS is supported, and inference with Tensorflow API is going to be supported as well. You can watch a recorded demo of this pipeline below.

Summary

In this post, we shared the sequential and session-based recommendation problems and introduced the Transformers4Rec library to tackle those tasks efficiently and effectively. Transformers4Rec leverages state-of-the-art NLP architectures via an integration with HuggingFace Transformers, and makes it easy to experiment with many different Transformer architectures and identify those that perform better for your own domain and dataset.

It is meant to be an easy-to-use and flexible library, making it possible to build powerful models with a few lines of code with the PyTorch and Tensorflow APIs. It is production-ready, open-source, and can be easily deployed due to its integration with NVTabular and Triton Inference Server.

We are releasing Transformers4Rec at the ACM RecSys’21 conference, where we will be presenting a paper (open-access here), demo and tutorial about the library (schedule).

Get started right now with Transformers4Rec documentation and examples! And checkout NVIDIA Merlin end-to-end solution for session-based recommendation pipelines.

Acknowledgements

The Transformers4rec library would not be possible without the incredible support of the entire Merlin team who came together to harden the library and make it ready for release. Our thanks to whose hard work and ingenuity made this happen under a tight timeline, namely (in alphabetical order): Ronay Ak, Alberto Aldea, Karl Byleen-Higley, Ben Frederickson, Jeongmin Lee, Adam Lesnikowski, Gabriel Moreira, Even Oldridge, Julio Perez, Sara Rabhi, Marc Romeyn, Benedikt Schifferer, and Onur Yilmaz.

--

--

Gabriel Moreira
NVIDIA Merlin

Gabriel Moreira is a Phd, Senior Applied Researcher at NVIDIA working on LLMs and Recommender Systems, and Google Developer Expert for ML since 2019.