Configuring Native Azure ML Logging with PyTorch Lighting

Aaron (Ari) Bornstein
Microsoft Azure
Published in
5 min readOct 20, 2020
Combining Azure and Lightning leads to more powerful logging

TL;DR: This post demonstrates how to connect PyTorch Lightning logging to Azure ML natively with ML Flow.

Full end to end implementations can be found on the official Azure Machine Learning GitHub repo.

If you are new to Azure you can get started a free subscription using the link below.

Azure ML and PyTorch Lighting

In my last post on the subject, I outlined the benefits of both PyTorch Lightning and Azure ML to simplify training deep learning models. If you haven’t yet check it out!

Once you’ve trained your first distributed PyTorch Lighting model with Azure ML it is time to add logging.

Why do we care about logging?

Logs are critical for troubleshooting and tracking the performance of machine learning models. Since we often train on remote clusters, logs provide a simple mechanism for having a clear understanding of what’s going on at each phase of developing our model.

As opposed to simple print statements, logs are time stamped, can be filtered by severity, and are used by Azure ML to visualize critical metrics during training, validation, and testing. Logging metrics with Azure ML is also a perquisite for using the Azure ML HyperDrive service to help find optimal model configurations.

Logging is a perfect demonstration of how both PyTorch Lighting and Azure ML combine to simplify your model training, just by using lightning we can save ourselves dozens of lines of PyTorch code in our application earning readability in the process.

Logging with PyTorch Lighting

In vanilla PyTorch, keeping track and maintaining logging code can get complicated very quickly.

ML frameworks and services such as Azure ML, Tensor Board, TestTube, Neptune.ai and Comet ML each have their own unique logging APIs. This means that ML engineers often need to maintain multiple log statements at each phase of training, validation and testing.

PyTorch Lighting simplifies this process by providing a unified logging interface that comes with out of the box support with the most popular machine learning logging APIs.

Multiple Loggers can even be chained together which greatly simplifies your code.

from pytorch_lightning.loggers import TensorBoardLogger, TestTubeLogger
logger1 = TensorBoardLogger('tb_logs', name='my_model')
logger2 = TestTubeLogger('tb_logs', name='my_model')
trainer = Trainer(logger=[logger1, logger2])

Once, loggers are provide to a PyTorch Lighting trainer they can be accessed in any lightning_module_function_or_hook outside of __init__.

class MyModule(LightningModule):
def some_lightning_module_function_or_hook(self):
some_img = fake_image()
# Option 1
self.logger.experiment[0].add_image('generated_images', some_img, 0)
# Option 2
self.logger[0].experiment.add_image('generated_images', some_img, 0)

Azure ML Logging with PyTorch Lighting with ML Flow

Since Azure ML has native integration with ML Flow, we can take advantage of PyTorch Lighting’s ML Flow Logger module to get native metric visualizations across multiple experiment runs and utilize hyperdrive with very minor changes to our training code.

Below I’ll outline the code needed to take advantage of Azure ML Logging with PyTorch lightning.

Step #1 Environment

Add PyTorch Lighting, Azure ML and ML Flow packages to the run environment.

pip 
- azureml-defaults
- mlflow
- azureml-mlflow
- pytorch-lightning

Step #2 Get Azure ML Run Context and ML Flow Tracking URL

from azureml.core.run import Runrun = Run.get_context()
mlflow_url = run.experiment.workspace.get_mlflow_tracking_uri()

Step #3 Initialize PyTorch Lighting MLFlow Logger and Link Run.id

mlf_logger = MLFlowLogger(experiment_name=run.experiment.name, tracking_uri=mlflow_url)
mlf_logger._run_id = run.id

Step #4 Add logging statements to the PyTorch Lighting the training_step, validation_step, and test_step Hooks

def training_step(self, batch, batch_idx):  # Calculate train loss here 
self.log("train_loss", loss)
# return test loss
def validation_step(self, batch, batch_idx): # Calculate validation loss here
self.log("val_loss", loss)
# return test loss
def test_step(self, batch, batch_idx):
# Calculate test loss here
self.log("test_loss", loss)
# return test loss

Step #5 Add the ML Flow Logger to the PyTorch Lightning Trainer

trainer = pl.Trainer.from_argparse_args(args)trainer.logger = mlf_logger  # enjoy default logging implemented by pl!

And there you have it!

Now when you submit your PyTorch Lighting train script you will get real time visualizations and HyperDrive inputs at Train, Validation, and Test time with a fraction of the normal required code.

You shouldn’t but if you have any issues let me know in the comments.

Next Steps

In the next post, I will show you how to configure Multi Node Distributed Training with PyTorch and Azure ML using Low Priority compute instances to minimize training cost by an order of magnitude.

Acknowledgements

I want to give a major shout out to Minna Xiao and Alex Deng from the Azure ML team for their support and commitment working towards a better developer experience with Open Source Frameworks such as PyTorch Lighting on Azure.

About the Author

Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with the Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

--

--

Aaron (Ari) Bornstein
Microsoft Azure

<Microsoft Open Source Engineer> I am an AI enthusiast with a passion for engaging with new technologies, history, and computational medicine.