Faster GPU-based Feature Engineering and Tabular Deep Learning Training with NVTabular on Kaggle.com

Published in

NVIDIA Merlin

3 min readJan 26, 2021

By Benedikt Schifferer and Even Oldridge

Some exciting news for anyone wanting to explore RAPIDS or NVTabular’s performance on popular tabular datasets; Kaggle recently added RAPIDS support to their docker container for GPU environments, and because it’s based off of RAPIDS cuDF, Kaggle now supports NVTabular in their notebook environment as well.

For those unfamiliar, NVTabular is an open source library that provides GPU-accelerated Feature Engineering & Preprocessing as well as improved dataloading for tabular deep learning training. Using Dask, which iterates over partitions of data, the library is able to handle datasets larger than host or GPU memory. This is ideal for data science competition, where time and computational resources are limited. In the kaggle environment with 13GB host memory and 16GB GPU memory, we see a ~11x speed-up for feature engineering and preprocessing of data and a ~2x speed-up in training the deep learning model. NVTabular’s dataloader can be used with TensorFlow, PyTorch or FastAI. A detailed benchmark of the TensorFlow dataloader can be found in our other blogpost.

We’ve created two Kaggle notebooks to demo the functionality, providing a short tutorial of NVTabular with “Faster ETL for Tabular Data” and showing the performance improvements of NVTabular data loader with “Faster FastAI Tabular Deep Learning”. Note that we use the new and easier NVTabular API, which can be installed from GitHub and will be released in v0.4 in February 2021.

In the example Faster ETL for Tabular Data, we provide a short introduction to NVTabular and show how to GPU-accelerate your ETL for arbitrarily large datasets, preparing a subset of Criteo Click Ad Prediction datasets to train with a deep learning model.

In a previous blogpost, we sped-up the original data pipeline by a factor of 3800x over a pandas run, reducing the running time from 5 days to 1.9 minutes. Our examples, Faster ETL for Tabular Data, uses the same feature engineering, preprocessing and training steps, however, it just uses a subset of 11GB instead of >1.3TB. In the kaggle environment, we achieve a 11x speed-up in comparison to a cpu-based dask-pandas version, reducing the ETL time from 35min to 3.2min.

In Faster FastAI Tabular Deep Learning, we show how to speed-up FastAI tabular deep learning models by ~2x using NVTabular data loader. Tabular datasets have a special structure and we developed a custom, highly optimized NVTabular dataloader, which addresses the unique challenges. In the example, we can train the same FastAI TabularModel ~2x faster by using the NVTabular data loader. In addition, the NVTabular dataloader is able to stream data from disk, enabling it to train models with larger than memory datasets.

Copy and try out the notebooks on your favorite tabular dataset and let us know how it goes. Each notebook takes less than 10min to read and execute end-2-end. We think NVTabular is valuable for the development of tabular deep learning models and are excited to creative use of it. Checkout our GitHub repository for updates and if you come up with your own feature engineering or preprocessing op that you think would be beneficial please create a feature request or submit a PR. We’d love to see community contributions to help grow the functionality of the library.

Faster GPU-based Feature Engineering and Tabular Deep Learning Training with NVTabular on Kaggle.com

Written by Benedikt Schifferer