Monitoring Azure Data Factory Performance in Power BI

Maanadh Naik
Version 1
Published in
6 min readApr 29, 2024

Using log analytics to monitor Azure Data Factory performance in PowerBI!

Photo by Markus Spiske on Unsplash

Introduction

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in Azure by designing and automating data movement, and data transformation at scale. ADF does not store any data itself. It allows you to create data-driven workflows to design or develop the movement of data between supported data stores and then process the data using compute services in other regions or an on-premise environment. It also allows you to monitor and manage workflows using both programmatic and UI mechanisms.

Azure Log Analytics is a service that monitors your cloud and on-premises resources and applications. It allows you to collect and analyse data generated by resources in your cloud and on-premises environments. Azure Log Analytics can be used to search, analyse, and visualize data to identify trends, troubleshoot issues, and monitor your systems.

You can also set up alerts to notify you when specific events or issues occur, so you can take a proactive approach to resolve them.

Problem Statement

We often need to analyse/monitor the performance of ADF pipelines. This information can be retrieved through log analytics. It is easy to fix problems where there is a pipeline failure or an error. However, the problem arises when we’re trying to optimize a current solution which currently is working fine without any failures, but we have the information.

The solution aims to use Log Analytics to aim and drill down and isolate which part of the pipeline is affecting the performance, whether it is an issue across the board or just a few areas causing the bottleneck. We can also use this as a monitoring tool to check the performance of the pipeline in Azure Data Factory (ADF)

To solve this issue, I have designed a customer solution in ADF and wanted to monitor the performance of pipelines In Azure Data Factory over a 7-day rolling period and understand the bottlenecks. Below are the steps to do it.

The Solution

This can be done by using log Analytics, found in Azure Portal, which is a tool that can be used to run queries against data in the Azure Monitor Logs store. Here is how you access it.

1) Go to the azure data factory, and you can find it under logs.

This should open this screen.

The IDE is very similar to a SQL server IDE and the way you query it is using a language called Kusto Query Language (KQL). This is a rich language, which is easy to read and use and very similar to SQL.

KQL is a read-only, request to process data and return results. Kusto queries are made of one or more query statements. The Syntax is as follows:

TABLENAME

| WHERE .. Condition 1

| WHERE .. Condition 2

In our solution,

We have the following stages,

1) Extract from source database.

2) Transform and cleanse.

3) Copy Cleansed Data to Staging in Azure SQL Server.

4) Create Dimension and Fact tables in Azure SQL

I have only extracted data for the 1st Stage (Extracting from source) and the query is as below.

ADFActivityRun

| where PipelineName == ‘1ExtractCsvToRaw’

You can also decide how far back you need to go based on the time range. I have chosen to go back 7 days to monitor the data on a 7-day rolling basis.

Once you have the necessary filter on the KQL, we move to the next step which is exporting this data.

You have multiple options to export,

1) CSV and Excel might be ideal if you want to import the data in Python notebooks and do so machine learning on the data.

2) For data visualization we have options to import it as a PowerBI dataset (.pbix) file or as an M-Query.

M-Query is highly recommended since the reports/visualizations can be refreshed within PowerBI and have up-to-date visualizations. When you export it as an M-Query, it returns a script in a notepad.

Now to load the data in PowerBI, we go into the transform data tab and click into the “Advanced editor tab “as shown below. We copy the content of the M-Query script into this and save it. This should pull the data we see in the Log Analytics in PowerBI.

Before we start visualizing the data, some columns contain data in JSON format. We can extract data from them through an in-built feature in PowerBI. You can just right-click on the column -> Transform and then click JSON. You can then extract the data in JSON as columns.

Here is a sample matrix visualization I have created to display the extraction time for the various tables in the Azure Data Factory on a 7-day rolling basis. The Time is in seconds.

The source queries were of the nature: SELECT Column_names.. FROM Table_name WHERE Company_name = ABC. The 2 variables here which varied and were of interest were Table name and the Company. I created an additional column called Company Name, so for a given table we can see which company it is that is slowing things down in terms of extraction speed and see whether there is a trend.

Above is a matrix visual in PowerBI, and the values are the duration time in seconds. Here it shows us the time duration of each stage of the pipeline, and it can be drilled down based on the hierarchy too. This is very useful in detecting anomalies/outlier behaviour and gives a very granular view of every aspect of the pipeline.

The next steps can involve creating different visuals based on the other columns extracted from the Log Analytics data.

Learn more about getting the most from your data with Version 1.

About the Author

Maanadh Naik is a Business Intelligence Support Consultant here at Version 1.

--

--