Training large Deep Learning Recommender Models with Merlin HugeCTR’s Python APIs — HugeCTR Series Part 2

Vinh Nguyen
5 min readMay 19, 2021


By Minseok Lee, Joey Wang, Vinh Nguyen and Ashish Sardana

In the first part of our Merlin HugeCTR blog post series, we discussed the challenges of training large deep learning recommender systems and how HugeCTR addresses these problems. Deep learning recommender systems can contain very large embedding tables which can exceed the host or GPU memory. We designed HugeCTR specifically for recommender systems. It is a framework dedicated to training and deployment of large scale recommender systems on the GPUs. It provides different strategies to distribute a single embedding table over multiple GPUs or nodes. HugeCTR is the main training engine of NVIDIA Merlin, a GPU-accelerated framework, designed to be a one stop shop for recommender system work, from data preparation, feature engineering, to multi-GPU training and to production-level inference either on prem or in the cloud.

While training performance and scalability have always been highlighted features of HugeCTR, empowering NVIDIA winning entries in the MLPerf training v0.7 recommendation task, we recently ingested feedback from our early adopters and customers to help improve ease of use.

This blog post will focus on our ongoing commitment and recent improvements towards ease of use. HugeCTR is a custom-built deep learning framework written in CUDA C++ that is dedicated to recommender systems. Initially, the hyperparameters and neural network architectures are defined in a JSON configuration and executed via a command-line interface. Recently, we added a Python API to make it easier to use. Table 1 summarizes the key differences between the command line and the Python APIs. We recommend using the Python API and will focus on it in the next sections. But if you are interested in the command-line interface, you can find some examples here.

Table 1: HugeCTR interface comparison.

Configure and train HugeCTR directly from Python

Since v2.3 release, HugeCTR started to provide an easy-to-use python interface for defining the model architecture, hyper-parameters, data loader and then the training loop. This interface brings HugeCTR closer to the data science Python ecosystem and practice.

We have two ways to leverage this interface:

1. A high-level Python API similar to Keras

HugeCTR now offers a high-level, Keras-like python API suite for defining the model, layers, optimizer and executing training. An example code snippet is given below.

As can be seen, this API emulates the popular Keras build-compile-fit paradigm.

2. The low-level Python API

The HugeCTR low level Python API allows reading model definition and optimizer config from a JSON file, thus providing backward compatibility. Moreover, this API allows training to be manually executed, iteration by iteration, with a Python loop, therefore obtaining fine-grained control over training. In the hands-on section of this blog, we will be detailing the use of this API for training models on two datasets. We will demonstrate the API in the examples below.

Predicting with a pretrained HugeCTR model

With the release of v3.0, HugeCTR added support for GPU-based inference to generate prediction for many batches. HugeCTR decouples the parameter server, embedding cache and inference session in order to better manage resources and make use of the GPU more effectively.

  • The parameter server is used to load and manage the embedding tables. For embedding tables that exceed GPU memory, the parameter server stores the embedding table on CPU memory.
  • The embedding cache provides embedding look-up service for the model. Active embedding entries are stored on the GPU memory for quick look up.
  • The inference session brings these two together along with the model weights and other parameters to perform forward propagation.

An example of the function call sequence to initialize HugeCTR inference is presented below. We initialize an InferenceSession with the config_file, embedding_cache and parameter_server.

HugeCTR Python inference API requires an inference config file in JSON format, which is similar to the train config JSON. However, we need to omit the optimizer and solver clauses, while adding an inference clause. We also need to change the output layer to Sigmoid type. The dense_model_file and sparse_model_file parameters within the inference clause should be set to point at the HugeCTR trained model files (_dense_xxxx.model and 0_sparse_xxxx.model). We provide multiple full examples in our github repository: eCommerce behavior dataset and Microsoft News dataset.

Let’s take a look at some examples

We provide multiple end-to-end examples of HugeCTR’s API in our github repository. These notebooks offer a complete walkthrough of Merlin on a real dataset and application domain, from data download, preprocessing and feature engineering to model training and inference.

1. High-level Python API with Criteo dataset

Criteo 1TB Click Logs dataset is the largest, publicly available dataset for recommender systems. It contains 1.3TB uncompressed click logs of around 4 billion examples. In our example, the dataset is either preprocessed with Pandas or NVTabular to normalize continuous features and categorify the categorical ones. Afterwards, we train a Deep & Cross neural network architecture with HugeCTR’s high-level API. First, we define the solver and optimizer to initialize the HugeCTR model with it. Then, we can add layer by layer similar to TensorFlow Keras’ API. Finally, we only need to call the .fit() function.

2. Low-Level Python API with E-Commerce behaviour dataset

In this demo notebook, we will be using the eCommerce behavior data from multi category store from REES46 Marketing Platform as our dataset. This notebook is built upon the NVIDIA tutorial at the RecSys 2020 conference. We use NVTabular for feature engineering and preprocessing and train Facebook’s Deep Learning Recommender Model (DLRM) with HugeCTR. We adapt an example Json configuration file for the Criteo click log dataset. Several parameters that need to be edited to match this dataset are:

  • slot_size_array: cardinalities for the categorical variables which can be obtained from the NVTabular workflow object.
  • dense_dim: number of dense features
  • slot_num: number of categorical variables

The following Python code executes the parameter updates per batch.

Similarly, we provide a 2nd example for the Microsoft News dataset.

Try out HugeCTR’s command-line and Python API to train your Recommendation System Pipelines

We are committed to providing a user-friendly and easy to use experience that streamlines recommender workflows. Our recent improvements to the HugeCTR interfaces reflects feedback from our early adopters and customers. Examples on how to make use of this new interface on several public datasets, small to large, have been made available at the HugeCTR github repository. We would like to invite you to adapt these examples for your own domain and witness the processing power of Merlin. As always, we are interested in hearing your feedback via Github as well as other channels. This was the second blog post about “Training large Deep Learning Recommender Models with HugeCTR’s new APIs” in our HugeCTR series. The next one will talk about deployment to production. Stay tuned!