Custom Logging in Azure Data Factory and Azure Synapse Analytics

Patrick Pichler
Creative Data
Published in
4 min readNov 9, 2022

How to implement a custom logger for Azure Data Factory and Azure Synapse Analytics by querying pipeline run information via REST API

Introduction

A proper and easily accessible logging around your data integration pipelines is the very first step towards data accuracy. With Azure Data Factory and Azure Synapse Analytics, you have the built-in functionality to integrate with Azure Log Analytics which is powerful if you want to have all your Azure logging at one place. However, it requires certain setup time and expertise which can cause quite an overhead for a simple pipeline logging. Further, you might want to add any additional information not available out-of-the-box in Log Analytics. This article shows you an alternative way of how to implement a logging solution on your own by retrieving pipeline runs via REST API and querying it afterwards using the Serverless SQL pool.

Pipeline Template

Overview

Both Azure Data Factory and Azure Synapse Analytics provide a REST API endpoint called queryPipelineRuns which allows you to query pipeline runs from your factory/workspace based on various input filter conditions. This is basically the same information you have available in the Monitor section under Pipeline runs:

Image by Author

This pipeline template will help you to retrieve this information as a part of your pipeline and store it in an attached Azure storage for querying it afterwards. It’s on purpose stored in JSON format to not alter the original response message coming from the REST API:

Image by Author

The logging pipeline itself basically contains a simple Copy task capturing always the latest information by taking the wrapper pipeline start timestamp as a filter criteria to make sure to always retrieve all information necessary from start to end of the actual pipeline. This is why it’s a good idea to build a wrapper pipeline around it in which you pass on the system variable @pipeline().TriggerTime to the template’s filter parameter lastUpdateAfter as you can see in the screenshot below. Any other potential filtering can still be done afterwards.

Image by Author

Config

The template comes with two linked services including two associated datasets. Firstly, the Azure Data Lake Storage Gen2 with JSON format and secondly, the the REST API endpoint.

The Azure Data Lake Storage Gen2 can essentially be accessed in any way you like. Though, the simplest way would probably be to use the (System Assigned) Managed Identity from either Azure Data Factory or Azure Synapse Analytics, it just need to have the Storage Blob Data Contributor role assigned to have access to the linked Azure storage. This would also allow you to reuse the existing dataset properties provided in the template for FileSystem, Directory and FileName:

Image by Author

The configuration for the REST endpoint slightly varies between Azure Data Factory and Azure Synapse Analytics, choose one of the configs below when importing the template.

Azure Data Factory:

  • Base URL: https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/queryPipelineRuns?api-version=2018-06-01
  • Authentication type: System Assigned Managed Identity
  • ADD resource: https://management.azure.com

Azure Synapse Analytics:

  • Base URL: {endpoint}/queryPipelineRuns?api-version=2020-12-01
  • Authentication type: System Assigned Managed Identity
  • ADD resource: https://dev.azuresynapse.net
Image by Author

Additionally, you can also adjust the Request body of the POST request to your needs. You will find more information regarding this in the official documentation of the corresponding REST API for either Azure Data Factory or Azure Synapse Analytics.

Image by Author

Serverless SQL pool — Query runs

Once you have stored the JSON files in your Azure storage, the best and most efficient way to query them is using Azure Synapse Analytics Serverless SQL pool. You can reuse the following code which will give you a similar view as in the Monitor section within Azure, you just need to adjust the path in the BULK section in OPENROWSET. Feel free to make any other adjustments.

Code by Author

Conclusion

This template can just be reused as it is to have a simple pipeline logging as well as extended and advanced to any complexity. The final view can then easily be integrated into a Power BI Report making it easily and always accessible without digging deeper into the Azure portal.

--

--

Patrick Pichler
Creative Data

Promoting sustainable data and AI strategies through open data architectures.