Custom Logging in Azure Data Factory and Azure Synapse Analytics
How to implement a custom logger for Azure Data Factory and Azure Synapse Analytics by querying pipeline run information via REST API
Introduction
A proper and easily accessible logging around your data integration pipelines is the very first step towards data accuracy. With Azure Data Factory and Azure Synapse Analytics, you have the built-in functionality to integrate with Azure Log Analytics which is powerful if you want to have all your Azure logging at one place. However, it requires certain setup time and expertise which can cause quite an overhead for a simple pipeline logging. Further, you might want to add any additional information not available out-of-the-box in Log Analytics. This article shows you an alternative way of how to implement a logging solution on your own by retrieving pipeline runs via REST API and querying it afterwards using the Serverless SQL pool.
Pipeline Template
Overview
Both Azure Data Factory and Azure Synapse Analytics provide a REST API endpoint called queryPipelineRuns which allows you to query pipeline runs from your factory/workspace based on various input filter conditions. This is basically the same information you have available in the Monitor section under Pipeline runs:
This pipeline template will help you to retrieve this information as a part of your pipeline and store it in an attached Azure storage for querying it afterwards. It’s on purpose stored in JSON format to not alter the original response message coming from the REST API:
The logging pipeline itself basically contains a simple Copy task capturing always the latest information by taking the wrapper pipeline start timestamp as a filter criteria to make sure to always retrieve all information necessary from start to end of the actual pipeline. This is why it’s a good idea to build a wrapper pipeline around it in which you pass on the system variable @pipeline().TriggerTime to the template’s filter parameter lastUpdateAfter as you can see in the screenshot below. Any other potential filtering can still be done afterwards.
Config
The template comes with two linked services including two associated datasets. Firstly, the Azure Data Lake Storage Gen2 with JSON format and secondly, the the REST API endpoint.
The Azure Data Lake Storage Gen2 can essentially be accessed in any way you like. Though, the simplest way would probably be to use the (System Assigned) Managed Identity from either Azure Data Factory or Azure Synapse Analytics, it just need to have the Storage Blob Data Contributor role assigned to have access to the linked Azure storage. This would also allow you to reuse the existing dataset properties provided in the template for FileSystem, Directory and FileName:
The configuration for the REST endpoint slightly varies between Azure Data Factory and Azure Synapse Analytics, choose one of the configs below when importing the template.
Azure Data Factory:
- Base URL:
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/queryPipelineRuns?api-version=2018-06-01
- Authentication type:
System Assigned Managed Identity
- ADD resource:
https://management.azure.com
Azure Synapse Analytics:
- Base URL:
{endpoint}/queryPipelineRuns?api-version=2020-12-01
- Authentication type:
System Assigned Managed Identity
- ADD resource:
https://dev.azuresynapse.net
Additionally, you can also adjust the Request body of the POST request to your needs. You will find more information regarding this in the official documentation of the corresponding REST API for either Azure Data Factory or Azure Synapse Analytics.
Serverless SQL pool — Query runs
Once you have stored the JSON files in your Azure storage, the best and most efficient way to query them is using Azure Synapse Analytics Serverless SQL pool. You can reuse the following code which will give you a similar view as in the Monitor section within Azure, you just need to adjust the path in the BULK section in OPENROWSET. Feel free to make any other adjustments.
Conclusion
This template can just be reused as it is to have a simple pipeline logging as well as extended and advanced to any complexity. The final view can then easily be integrated into a Power BI Report making it easily and always accessible without digging deeper into the Azure portal.