Two Ways to Profile PyTorch Models on Remote Server

Published in

PyTorch

4 min readJul 16, 2021

With the recent release of PyTorch Profiler, deep learning model performance troubleshooting becomes much easier and more accessible to developers and data scientists. It enables users to have both CPU and GPU level of information in the single view and give them an easy way to correlate PyTorch operators with GPU kernel invocations and utilization. It’s common for Data Scientists to run their training on powerful GPU enabled machines on the cloud. Profiler could be enabled during these runs to generate traces, and it is helpful to be able to visualize results on local environment.

Another user case for remote profiling is when users are working on Windows machines. At the time of this writing, PyTorch Profiler does not enable collecting and correlating GPU information on Windows, our engineers are working towards bringing this functionality on par with Linux machines In the meantime Data Scientists could run and collect traces on Linux machines while working with visualizations locally.

In this article I’ll show you two ways to run your profiling session in Linux, get all the traces, and visualize results using your regular Windows machine.

Use VS Code TensorBoard Integration and Remote-SSH extension

One option is to run model training in a powerful GPU-enabled Linux machine and connect to it using VS Code Remote — SSH extension. It lets you use any remote machine with a SSH server as your development environment. This can greatly simplify development and help with troubleshooting and profiling PyTorch models.

Remote profiling using VS Code Remote SSH extension

Once you have your Linux machine up and running and connected to it via SSH, install Python Extension on the remote machine. It will not only provide you with the best experience when working with Python files and Jupyter notebook, but it also comes integrated with TensorBoard and PyTorch Profiler Plugin. Launch TensorBoard and point to the directory that contains Profiler traces.

To setup profiler traces for the session, you could use profiler context API like it is shown the following code snippet:

def output_fn(p):
   p.export_chrome_trace(“./trace/resnet50_4/worker0.pt.trace.json”)# add context manager around training loop
with torch.profiler.profile(
       activities=[torch.profiler.ProfilerActivity.CPU,
                  torch.profiler.ProfilerActivity.CUDA],
       on_trace_ready=output_fn,
       record_shapes=True,
       with_stack=True) as p:

More details on adding profiler in PyTorch code could be found in Profiler Tutorial.

Stream Profiler Logs to the Storage (Azure Blob, S3) and use TensorBoard locally

Another possible setup does not utilize VSCode and does not require to open and connect to virtual machine’s SSH port (which might be problematic in some environments). Approach is to run model in powerful GPU-enabled Linux machine (or multiple machines as now Profiler supports Distributed training) and direct profiling session traces to be written into remote storage. Azure Blob, Amazon S3 and GCP are supported. In this release, PyTorch Profiler supports authenticated access to Azure Blob and S3 bucket, and anonymous access for GCP storage which will be updated in future release.

Remote profiling with distributed log collection

Mount Storage to Training VM

Major clouds providers allow a way to mount cloud storage such as S3 or Azure Blob to Linux machines as a local drive to enable typical file operations. For the PyTorch profiler to write traces into local path we would need to mount storage and point export_chrome_trace to the mounted drive. In our example we will use Azure Blob, but you could find similar approaches for S3 and GCP described in profiler github repo.

Create Azure Storage account and follow the steps described in Microsoft docs on How to mount Blob storage as a file system with blobfuse. Blobfuse driver allows to mount Azure Blob Container to the Linux VM as a local drive. You could also use bash script example from here.

Setup TensorBoard

Once profiling session is done, setup TensorBoard in your local windows environment, more details could be found in GitHub Repo, here are steps for TensorBoard and PyTorch Profiler Plugin installation to enable visualization of traces from Azure Blob :

# install tensorboard >= 2.4.1pip install tensorboard# install Pytorch Profiler Plugin with relevant storage options
# Command below will install plugin and azure-storage-blob package to read from Blobpip install torch-tb-profiler[blob]# For GCP or S3 use following optionspip install torch-tb-profiler[gs]
pip install torch-tb-profiler[s3]

And run TensorBoard directing it to connect to the storage for logs location. In the example below we are using publicly available demo logs (for private Blob access set AZURE_STORAGE_CONNECTION_STRING environment variable):

tensorboard — logdir=https://torchtbprofiler.blob.core.windows.net/torchtbprofiler/demo/memory_demo — bind_all

use –bind-all option if browser is not running in the same machine that you start TensorBoard in, for example if your Python/TensorBoard environment is running in WSL (Windows Subsystem for Linux)

Access TensorBoard in the browser which is typically localhost:6006, or <machinename>:6006 if remote.

Happy Profiling.

Note: prior to PyTorch 1.9 users trying to run GPU profiling on Windows received “AssertionError: Requested Kineto profiling but Kineto is not available, make sure PyTorch is built with USE_KINETO=1”. In PyTorch 1.9, the profiler will fall back to CPU profiling seamlessly.

Two Ways to Profile PyTorch Models on Remote Server

Use VS Code TensorBoard Integration and Remote-SSH extension

Stream Profiler Logs to the Storage (Azure Blob, S3) and use TensorBoard locally

Mount Storage to Training VM

Setup TensorBoard

Written by Elena Neroslavskaya