Learn how Meta, Alibaba, ASOS, and Kuaishou Technology are delivering recommendations at scale (at GTC Spring 2023)

Radek Osmulski
Published in
4 min readMar 8, 2023


Over the last couple of months, the Merlin team has been all about aligning the features of our software with customer needs and developing world-class support for session-based recommendations.

Additionally, we have worked hard to bring you updated tutorials to make jumping into using our framework even easier!

In the upcoming GTC (join us online for free, March 20–23, 2023), several companies using NVIDIA GPUs and NVIDIA software will share with you how they are meeting business needs by accelerating and expanding their recommender system workflows!

GTC Personalization with NVIDIA Merlin

Before we get started — exciting news from the Merlin Team!

The GTC web portal is now using the Merlin Framework to recommend talks! This is a session-based solution that leverages Large Language Models for title and abstract preprocessing.

Please see the results in the screenshot below. If you would like to learn more about how we are doing recommendations ourselves, please tune in to a talk on this subject by our colleagues.

How are companies using Merlin in their recommender system pipelines?

Fast and Scalable Training of Deep Learning Recommendation Models- Sarunya Pumma is a software engineer in the AI system Co-Design at Meta. In this talk, she outlines the importance of GPU in powering Meta’s key applications, the unique computational challenges associated with these personalization and recommendation tasks, and how the kernel library, FBGEMM-GPU, solves them with various GPU optimization techniques.

Serving Large Recommender Models with 10x Performance Gain — Xiao Liang, a Software Architect at Kuaishou Technology shares how using various techniques (including Tensor Core and GPU-based caching) Kuaishou Technology got an average 10x performance gain in their mainstream models.

Implementing Model Serving at Scale — a team of machine learning engineers from ASOS (Rick Bruins and Neha Patel) will walk us through how to serve multiple models at scale with Triton, A/B test models, serve ensemble models, and monitor progress in production. They will also discuss the MLOps processes and the importance of a cross-functional approach from concept to deployment, and share performance results to illustrate the impact of this approach.

DeepRec: Toward High-Performance Recommendation Deep Learning Framework with GPU Acceleration — Tongxuan Liu, Staff Engineer at Alibaba and Shijie Liu from NVIDIA will introduce DeepRec to address the challenges of effectively and efficiently training Deep Learning models. The solutions that will be covered include a graph-based GPU memory allocator, a hybrid distributed training framework, and a multi-stream/CUDA Graph-based GPU runtime.

Other talks and tutorials from the Merlin team

We have prepared even more talks by our partners and by us! Please find a couple of selected talks below:

Building Session-Based Recommendation Models with NVIDIA Merlin — experts from the NVIDIA Merlin team (Ronay Ak, Senior Data Scientist, Sara Rabhi, Senior Research Scientist and Benedikt Schifferer, Deep Learning Engineer) will walk us through all the steps necessary to train a session-based recommendation model using the Merlin Framework! Topics covered will include main concepts of session-based recommendations, creating features, and model training and evaluation!

Merlin Updates — Build and Deploy Recommender Systems at Any Scale — come listen to our colleagues (Wenwen Gao, Product Manager, and Angel Martinez, Deep Learning Engineer) discuss the updates Merlin updates! Learn how NVIDIA does recommendations ourselves, both the high-level concepts and the technical implementation. Additionally, the Grace Hopper Superchip architecture and the benefits it can provide to accelerating your workflows!

Using GNNs in LinkedIn Recommendation Systems — In this talk, Shihai He, a Staff Software Engineer at LinkedIn, will share with us how LinkedIn developed a large graph neural network model to learn users’ professional network information. He will discuss how the model was developed and how it fits into the overall RecSys strategy at LinkedIn!

Optimizing Data Systems for Merlin and Triton — Sam Partee, a Principal ML Engineer at Redis, will walk us through how to use Redis as high-performance data storage for Triton and Merlin inference pipelines. The techniques he will share can have a dramatic impact on the latency of machine learning inference. Among other topics, Sam will discuss intelligent inference response caching with NVIDIA Triton and how to integrate Triton with a data store.

Accelerate AI Innovation with Unmatched Cloud Scale and Performance — a team from Microsoft (Nidhi Chappell, Partner General Product Manager and Kathleen Mitford CVP, Azure marketing) will tell us about Azure’s AI platform, their latest updates, and about customer experiences deploying AI RecSys models at scale!


The NVIDIA GTC Spring 2023 is right around the corner! You can register online for free to reserve your spot here.

The conference will feature a lineup of great speakers including Demis Hassabis (DeepMind), Ilya Sutskever (OpenAI), Anima Anandkumar (NVIDIA), and of course Jensen Huang, the CEO of NVIDIA, who will deliver the keynote on the broader state of GPU acceleration and the technical breakthroughs happening now across multiple industries.

See you at the GTC!

And for further details on how we personalize GTC, please check out our blog post about the email use-case!



Radek Osmulski

I ❤️ ML / DL ideas — I tweet about them / write about them / implement them. Recommender Systems at NVIDIA