PyTorch sessions at NVIDIA GTC March 20, 2023

Published in

PyTorch

5 min readMar 17, 2023

Talks by PyTorch and Meta researchers and contributors at NVIDIA GTC 2023

NVIDIA GPU Technology Conference (GTC) is one of those unique conferences that’s at the intersection of innovations in AI research and AI hardware & infrastructure. PyTorch’s core strengths have always been ease of use and strong GPU acceleration, and as you’d expect there are several exciting PyTorch and performance related talks at NVIDIA GTC 2023.

Below, I’ve compiled a list of PyTorch sessions at GTC that you would not want to miss. I’ve included additional resources for each talk including project web pages, videos and GitHub repo and more for your continued learning. And of course GTC virtual is free, so make sure you register and add the following PyTorch sessions to your calendar.

Compile and Train with 43% Speedup using PyTorch 2.0 [S52422]

When: Tuesday, Mar 21 12:00 PM — 12:50 PM PDT

In this talk, Soumith Chintala, co-creator of PyTorch and Jason Ansel, PyTorch compiler research scientist will discuss the latest innovations in PyTorch 2.0. Top priorities for PyTorch have always been flexibility and ease of use, while continuing to deliver good performance and PyTorch 2.0 gives you all of that with better performance while being fully backward compatible. Soumith and Jason will share details about the new under-the-hood compiler technologies: TorchDynamo, AITAutograd, and TorchInductor and how they accelerate 93% of 163 popular benchmark models. You can learn more about PyTorch 2.0 on the PyTorch 2.0 getting started page and catch the Ask the expert livestreams on the PyTorch youtube channel for deeper dive into PyTorch 2.0 features.

Training and Productionizing LLMs with PyTorch on AWS (Presented by Amazon Web Services) [S52388]

In this talk Ankur Srivastava from AWS and Hamid Shojanazeri on the PyTorch team at Meta will discuss Large language models (LLMs). AWS offers many services for running distributed and range of compute, storage and networking options based on your performance and cost needs. Deploying LLMs in production is challenging, as the process of generation essentially involves many iterations of inference/forward pass to produce one instance of output. They will also discuss some of the advancements such as accelerated PyTorch Transformers that help to run such models more cost effectively in production. If you’re an existing AWS customer or want to learn about how to scale LLMs on AWS, this is the talk for you.

Here are some additional resources on scaling and hosting PyTorch on AWS:

Blog post: Efficient Large-Scale Training with Pytorch FSDP and AWS
Blog post: HF Accelerated Transformer encoder
Blog post: A BetterTransformer for Fast Transformer Inference
Blog post: Torchserve Performance Tuning
Video: PyTorch 2.0 Q&A: Optimizing Transformers for Inference

Exploring Next-Generation Methods for Optimizing PyTorch Models for Inference with Torch-TensorRT [S51714]

When: Thursday, Mar 23 9:00 AM — 9:50 AM PDT

NVIDIA TensorRT is NVIDIA’s high-performance inference compiler and runtime that runs on all NVIDIA platforms from data center to edge devices. In this talk Naren (NVIDIA) and Wei Wei (Meta) will discuss PyTorch model deployment using the latest compiler technologies in PyTorch 2.0 combined with NVIDIA TensorRT. Previously NVIDIA TensorRT used TorchScript models to optimize for target GPU hardware. With TorchDynamo deploying PyTorch is easier with more accurate tracing (relative to TorchScript) and modification of the source model completely in Python.

If you’re deploying models to NVIDIA targets, the presenters will demonstrate how you can start experimenting with FX, Dynamo, and TensorRT today to get a preview of the direction Torch-TensorRT is headed.

Automated Pipeline Parallelism for PyTorch with Compiler Techniques [S51254]

When: Thursday, Mar 23 8:00 AM — 8:25 AM PDT

One of the challenges of model-parallel training is underutilization of GPUs due to naively splitting sequential models across GPUs. Pipeline parallelism addresses this issue by further splitting input mini batches into microbatches and pipelines the execution across multiple GPUs. In this talk, Ke Wen (Meta), Vaibhav Singh and Isaack Karanja (Google Cloud Patform) will discuss the PiPPy library that provides automated pipeline parallelism for PyTorch models. PiPPy consists of a compiler stack capable of automatically splitting a model into stages without requiring intrusive code changes to the model. It also provides a distributed runtime that helps users to distribute the split stages to multiple devices and multiple hosts, and orchestrates micro-batch execution in a pipelined and concurrent fashion. We’re going to demonstrate the use of PiPPy for Hugging Face models on cloud.

Learn more about pipeline parallelism in PyTorch:

PyTorch Doc: Model Parallelism using multiple GPUs
GitHub: PiPPy: Pipeline Parallelism for PyTorch

Speech-To-Speech Translation System for a Real-World Unwritten Language [S51780]

When: Tuesday, Mar 21 1:00 PM — 1:25 PM PDT

A lot of machine translation work is focused on written languages. Did you know there are over 3500+ languages in the world that are primarily oral and do not have a standard or widely used writing system? How do you train machine translation models when you don’t have troves of written text for training data? In this talk Peng-Jen Che nwill be sharing his research on the problem of speech-to-speech translation for such spoken-only languages. Using English-Taiwanese Hokkien as a case study, Peng-Jen and team demonstrate that they can successfully build the speech-to-speech translation for the language pair working only with speech modality. Compared to text, speech modality has more variability to model and is more fine-grained, thus requiring more computation. Attend this to talk if you’re interested in machine translation and how to accelerate training with PyTorch on GPUs to build high-quality speech-to-speech translation systems for a non-written language.

Additional resources:

Code examples: https://github.com/facebookresearch/fairseq/tree/ust/examples/hokkien
Interactive demo on HuggingFace: https://huggingface.co/spaces/facebook/Hokkien_Translation
Research paper: https://research.facebook.com/publications/hokkien-direct-speech-to-speech-translation/
Blog post: https://ai.facebook.com/blog/ai-translation-hokkien/

Fast and Scalable Training of Deep Learning Recommendation Models [S51234]

When: Wednesday, Mar 22 2:00 PM — 2:25 PM PDT

Deep learning recommendation systems can deliver curated set of recommendations of products and services to end users. But when the number of items in a catalog grows large, it poses unique computational challenges such as sparse data accesses, large model size requirements, and high computational complexity. In this talk Sarunya Pumma, software engineer at Meta, will discuss unique computational challenges behind these workloads, core technologies that enable recommendation systems at scale (PyTorch FBGEMM). Attend this talk to learn more about specific GPU optimization approaches and techniques required for performance in both inference and training.

Additional resources:

FBGEMM: https://github.com/pytorch/FBGEMM

More PyTorch talks by the community!

Accelerate ML Performance and Simplify Development with OpenXLA [S51689]
FP8 Mixed-Precision Training with Hugging Face Accelerate [S51370]

Join us!

Hope you enjoy these PyTorch talks at GTC. Remember to register for free and bookmark these talks. PyTorch 2.0 stable release is now generally available, read more about it in the release blog post: https://pytorch.org/blog/pytorch-2.0-release/

Head over to the getting started page to install it and share your feedback with us on GitHub issues or https://discuss.pytorch.org.