Configuring Native Azure ML Logging with PyTorch Lighting

Aaron (Ari) Bornstein

Published in

Microsoft Azure

5 min readOct 20, 2020

Combining Azure and Lightning leads to more powerful logging

TL;DR: This post demonstrates how to connect PyTorch Lightning logging to Azure ML natively with ML Flow.

Full end to end implementations can be found on the official Azure Machine Learning GitHub repo.

Azure/azureml-examples

description: learn how to train and log metrics with PyTorch Lightning PyTorch Lightning is a lightweight open-source…

github.com

If you are new to Azure you can get started a free subscription using the link below.

Create your Azure free account today | Microsoft Azure

Create personalized experiences with AI Build connected, cross-platform experiences-tailored to customer interactions…

azure.microsoft.com

Azure ML and PyTorch Lighting

In my last post on the subject, I outlined the benefits of both PyTorch Lightning and Azure ML to simplify training deep learning models. If you haven’t yet check it out!

Training Your First Distributed PyTorch Lightning Model with Azure ML

TLDR; This post outlines how to get started training Multi GPU Models with PyTorch Lightning using Azure Machine…

medium.com

Once you’ve trained your first distributed PyTorch Lighting model with Azure ML it is time to add logging.

Why do we care about logging?

Logs are critical for troubleshooting and tracking the performance of machine learning models. Since we often train on remote clusters, logs provide a simple mechanism for having a clear understanding of what’s going on at each phase of developing our model.

As opposed to simple print statements, logs are time stamped, can be filtered by severity, and are used by Azure ML to visualize critical metrics during training, validation, and testing. Logging metrics with Azure ML is also a perquisite for using the Azure ML HyperDrive service to help find optimal model configurations.

Tune hyperparameters for your model - Azure Machine Learning

Automate efficient hyperparameter tuning by using Azure Machine Learning HyperDrive package. Learn how to complete the…

docs.microsoft.com

Logging is a perfect demonstration of how both PyTorch Lighting and Azure ML combine to simplify your model training, just by using lightning we can save ourselves dozens of lines of PyTorch code in our application earning readability in the process.

Logging with PyTorch Lighting

In vanilla PyTorch, keeping track and maintaining logging code can get complicated very quickly.

ML frameworks and services such as Azure ML, Tensor Board, TestTube, Neptune.ai and Comet ML each have their own unique logging APIs. This means that ML engineers often need to maintain multiple log statements at each phase of training, validation and testing.

PyTorch Lighting simplifies this process by providing a unified logging interface that comes with out of the box support with the most popular machine learning logging APIs.

Loggers - PyTorch Lightning 1.0.2 documentation

Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default…

pytorch-lightning.readthedocs.io

Multiple Loggers can even be chained together which greatly simplifies your code.

from pytorch_lightning.loggers import TensorBoardLogger, TestTubeLogger
logger1 = TensorBoardLogger('tb_logs', name='my_model')
logger2 = TestTubeLogger('tb_logs', name='my_model')
trainer = Trainer(logger=[logger1, logger2])

Once, loggers are provide to a PyTorch Lighting trainer they can be accessed in any lightning_module_function_or_hook outside of __init__.

class MyModule(LightningModule):
    def some_lightning_module_function_or_hook(self):
        some_img = fake_image()
        # Option 1
        self.logger.experiment[0].add_image('generated_images', some_img, 0)
        # Option 2
        self.logger[0].experiment.add_image('generated_images', some_img, 0)

Azure ML Logging with PyTorch Lighting with ML Flow

Since Azure ML has native integration with ML Flow, we can take advantage of PyTorch Lighting’s ML Flow Logger module to get native metric visualizations across multiple experiment runs and utilize hyperdrive with very minor changes to our training code.

Below I’ll outline the code needed to take advantage of Azure ML Logging with PyTorch lightning.

Step #1 Environment

Add PyTorch Lighting, Azure ML and ML Flow packages to the run environment.

pip 
    - azureml-defaults
    - mlflow
    - azureml-mlflow
    - pytorch-lightning

Step #2 Get Azure ML Run Context and ML Flow Tracking URL

from azureml.core.run import Runrun = Run.get_context()
mlflow_url = run.experiment.workspace.get_mlflow_tracking_uri()

Step #3 Initialize PyTorch Lighting MLFlow Logger and Link Run.id

mlf_logger = MLFlowLogger(experiment_name=run.experiment.name, tracking_uri=mlflow_url)
mlf_logger._run_id = run.id

Step #4 Add logging statements to the PyTorch Lighting the training_step, validation_step, and test_step Hooks

def training_step(self, batch, batch_idx):  # Calculate train loss here 
  self.log("train_loss", loss)
  # return test lossdef validation_step(self, batch, batch_idx):  # Calculate validation loss here 
  self.log("val_loss", loss)
  # return test lossdef test_step(self, batch, batch_idx):
  # Calculate test loss here 
  self.log("test_loss", loss)
  # return test loss

Step #5 Add the ML Flow Logger to the PyTorch Lightning Trainer

trainer = pl.Trainer.from_argparse_args(args)trainer.logger = mlf_logger  # enjoy default logging implemented by pl!

And there you have it!

Now when you submit your PyTorch Lighting train script you will get real time visualizations and HyperDrive inputs at Train, Validation, and Test time with a fraction of the normal required code.

You shouldn’t but if you have any issues let me know in the comments.

Next Steps

In the next post, I will show you how to configure Multi Node Distributed Training with PyTorch and Azure ML using Low Priority compute instances to minimize training cost by an order of magnitude.

Plan and manage costs - Azure Machine Learning

This article describes how to plan and manage costs for Azure Machine Learning. First, you use the Azure pricing…

docs.microsoft.com

Acknowledgements

I want to give a major shout out to Minna Xiao and Alex Deng from the Azure ML team for their support and commitment working towards a better developer experience with Open Source Frameworks such as PyTorch Lighting on Azure.

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with the Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

Configuring Native Azure ML Logging with PyTorch Lighting

Azure/azureml-examples

description: learn how to train and log metrics with PyTorch Lightning PyTorch Lightning is a lightweight open-source…

Create your Azure free account today | Microsoft Azure

Create personalized experiences with AI Build connected, cross-platform experiences-tailored to customer interactions…

Azure ML and PyTorch Lighting

Training Your First Distributed PyTorch Lightning Model with Azure ML

TLDR; This post outlines how to get started training Multi GPU Models with PyTorch Lightning using Azure Machine…

Why do we care about logging?

Tune hyperparameters for your model - Azure Machine Learning

Automate efficient hyperparameter tuning by using Azure Machine Learning HyperDrive package. Learn how to complete the…

Logging with PyTorch Lighting

Loggers - PyTorch Lightning 1.0.2 documentation

Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default…

Azure ML Logging with PyTorch Lighting with ML Flow

Step #1 Environment

Step #2 Get Azure ML Run Context and ML Flow Tracking URL

Step #3 Initialize PyTorch Lighting MLFlow Logger and Link Run.id

Step #4 Add logging statements to the PyTorch Lighting the training_step, validation_step, and test_step Hooks

Step #5 Add the ML Flow Logger to the PyTorch Lightning Trainer

Next Steps

Plan and manage costs - Azure Machine Learning

This article describes how to plan and manage costs for Azure Machine Learning. First, you use the Azure pricing…

Acknowledgements

About the Author

Written by Aaron (Ari) Bornstein