A New API for NVTabular and Inference support are coming with Merlin’s 0.4 release

Benedikt Schifferer
NVIDIA Merlin
Published in
4 min readMar 10, 2021

By Benedikt Schifferer and Even Oldridge

Almost a year ago, NVIDIA announced Merlin, an open-source framework for accelerating deep learning recommender systems with GPUs. The vision is an end to end deep learning pipeline for preparing data, training models and deploying them to production, all on GPU. Over the last year, there have been tremendous developments in preprocessing and training, with preprocessing on NVTabular scaling to 128 GPUs and achieving ~10000x speed-ups on tabular data benchmarks, TensorFlow / PyTorch pipelines improving with customized dataloaders by 9x and training extremely large deep learning models with HugeCTR by distributing embedding tables over multiple GPUs.

With our 0.4 release Merlin adds Triton Inference Server support to deploy trained models easily to production and a more easier API for NVTabular and HugeCTR.

Model Inference with NVIDIA Merlin

The last step of a deep learning recommender systems pipeline is inference. Deploying a trained model to production is a significant engineering problem and requires coordination across multiple teams to make everything work. Deep learning recommenders are even more complex in that they have to be trained frequently to learn information for new users/new items. Deployment to production needs to be easier. Thankfully NVIDIA’s Triton Inference Server is more than up to the task of deploying deep learning models quickly and easily at scale. Triton Inference Server supports all major deep learning frameworks, including custom builts. It runs models concurrently on GPUs maximizing utilization and supports low latency real time inferencing or batch inferencing to maximize GPU/CPU utilization. Also available as a Docker container, Triton Inference Server integrates with Kubernetes for orchestration, metrics, and auto-scaling.

To that rich feature set we’ve added support for NVTabular and HugeCTR, allowing you to deploy large and complex recommender workflows and models into production with only a few lines of code.

A less often discussed challenge is how to deploy preprocessing and feature engineering workflows. Making sure that the same transformations happen to the data as was used at training time takes significant engineering effort. With NVTabular’s Triton back end we take care of that for you. During training workflows dataset statistics are collected which can then be applied to the production data as well.

Deep learning recommender systems require large embedding tables for users and/or items which do not fit on a single GPU. HugeCTR scales deep learning model training by distributing the embedding tables to multiple GPUs and nodes. It integrates Triton Inference support to deploy large embedding tables to production.

Check out our examples for NVTabular+TensorFlow and HugeCTR.

An even easier API

Making Merlin easier to use is always at the forefront of our mind as we develop it. We’re interested in speeding up not just the time it takes to train a model, but also the time it takes you to explore the features and develop the architecture. The new release contains new high-level APIs for both NVTabular and HugeCTR. The new APIs makes it easier to define workflows and training pipelines.

NVTabular has added an overloaded >> operator to chain NVTabular operators to a workflow. Operators can be applied to a list of column names or a ColumnGroup

Example:

A NVTabular workflow is a Directed Acyclic Graph (DAG) and can be visualized in the new release:

Visualization of ETL Workflow

Check out our Getting Started notebooks for more on the new NVTabular API.

HugeCTR is a highly optimized deep learning framework dedicated to recommender systems written in CUDA C++. In the new release, it adds a keras-like Python API to define a model architecture.

Check out our examples for the new HugeCTR Python API

Try out NVIDIA Merlin for your end-2-end recommendation pipeline

In the latest release, we provide examples for end-2-end recommendation pipelines from ETL to Training to Inference. Yet, NVIDIA Merlin is open source and we are continuously working to provide hands-on examples, guides, and documentation to help you get started. Just a few of those examples include a Merlin overview, additional information about feature engineering with NVTabular (API documentation), an accelerated training guide, and a large collection of examples for feature engineering and training with TensforFlow, PyTorch, and HugeCTR. We would love to hear your feedback, you can reach us through our github or by leaving a comment here. We look forward to hearing from you! If you’re as passionate about recommender systems as we are please check out this open role for the team. We’re growing fast and would love to work with you to help make RecSys fast and easy to use on the GPU.

--

--

Benedikt Schifferer
NVIDIA Merlin

Benedikt Schifferer is a Deep Learning Engineer at NVIDIA working on recommender systems. Prior, he graduated as MSc. Data Science from Columbia University