Transformers, multi-GPU, HPO, and more are here for Recommender Models in the 22.11 Release of NVIDIA Merlin

Radek Osmulski
Published in
3 min readDec 1, 2022


We recently released a new version of the Merlin Framework! We added a couple of features that can help you train better models faster and would like to tell you about them. We also now provide support for session-based training.

Here are the new additions.

Better models with session-based Transformer Models

Merlin Models received a big addition in this release! Transformers — the recent, game-changing architecture, is now available for training session-based models

In the session-based RecSys Challenge 2022 organized by Dressipi the NVIDIA team placed third with Transformers being an important part of the ensemble. This is certainly an architecture to add to your toolbelt!

If you’d like to familiarize yourself with Transformers in the context of session-based recommendations, here is a great article that will provide you with all the details. With release 22.11 we added support for Transformers to Merlin Models and you can now train them using Tensorflow!

For more information on how to preprocess your data for session-based training and how to fit your models, please take a look at this notebook.

Faster training with data-parallel multi-GPU training

Merlin Models now natively supports training on multiple GPUs with Horovod. You can now train your models faster with very few code changes.

The supported paradigm is Data Parallel Training. The way it works is that your model gets replicated across multiple GPUs. Each GPU contained the full model with all its weights.

However, in training, each GPU receives only a portion of the examples in the batch to train on.

This allows you to train with bigger batch sizes and while the scaling is not perfectly linear due to overhead, you will be able to train on more data in significantly less time.

But do not take our word for it! This example will walk you through how to train on multiple GPUs using Horovod and Merlin Models. Give it a try to learn how much you will be able to accelerate your workloads!

Training better models with hyperparameter tuning

Deep Learning models are notorious for how hard it is to train them well. Even the simplest of deep learning models exposes many parameters you can tweak, such as the learning rate, embedding size, dropout probability, and more.

On top of that, for how many epochs should we train? What scheduler should we use? In combination with which optimizer?

Answering those questions via manual experimentation is extremely tedious and time-consuming. A much better option is to code up a hyperparameter experiment and have it run in the background as you work on something else.

Optuna is a great framework to use for hyperparameter optimization. It is succinct and to the point and provides you with just the functionality you need.

To make it easier for you to embark on the hyperparameter optimization journey, we’ve put together this example. In it, we guide you through how to set up a hyperparameter experiment and show you the ropes of working with Optuna.

One of the plots you will learn to create with Optuna to diagnose model training


The above are just the highlights of the exciting new functionality that is now available via the Merlin Framework. Please find the software on GitHub here.

And if you are looking for a way to get started with the Merlin Framework, we have good news for you! In the most recent release, we have significantly revamped the NVTabular introductory notebooks. If you would like to start accelerating your RecSys workflow, beginning with moving preprocessing of your data onto GPU, they are a great place to start.

Alternatively, if you would like to jump straight to training deep learning (or classical ML) RecSys models, Merlin Models has got you covered!

To stay in the loop of future updates, make sure to follow our Medium publication!

Thank you for reading!



Radek Osmulski

I ❤️ ML / DL ideas — I tweet about them / write about them / implement them. Recommender Systems at NVIDIA