Querying Azure Logs using Power BI Dataflow & Custom Function

Rémy Bosman
Wortell
Published in
5 min readApr 1, 2020

Iwas recently asked to create a Power BI dataset with the Azure Sign-In logs for the last 90 days. My first thought: that’s probably simple! Azure Logs are stored for at least 90 days, there is an API to retrieve the data, and Power BI can be used to simply create a query that calls the API.

Soon I realized that the API has some blocking limitations. The amount of Sign-In logs collected per day was incredibly large, not surprising for an organization with roughly 37.000 employees worldwide.

Azure Log Analytics REST API Limitations
Queries cannot return more than 500,000 rows
Queries cannot return more than 64,000,000 bytes (~61 MiB total data)
Queries cannot run longer than 10 minutes by default.

Trying an alternative approach—Archive Sign-In logs to Storage Account

I enabled the Azure Log Retention, this saves all Sign-In Events to blob storage container for a given retention period time (max 365 days). This can be enabled in Diagnostics Settings in Azure Active Directory. Short term downside of this approach is that historical sing-in logs are not available after enabling this setting.

The second problem I ran into was an error message in Power BI after importing the BLOB.

DataFormat.Error: We found extra characters at the end of JSON input.

One of the solutions suggested on the internet is transforming the JSON to TEXT, and then importing the data. This approach works, but the transformation takes forever for this amount of JSON files. Also, the size of the data loaded into Power BI exceeded 10GB within a few weeks of collected Sign-in Logs.

Solution — Combining Dataflows and Power Query Custom Functions.

A few weeks ago I’ve been playing with Power Query Custom Functions in Power BI. One of the things I learned was how to use variables in a Custom Functions. The idea is to create a Dataflow which that queries the Azure Log Analytics REST API per-day (90 separate calls) instead of a single query for 90 day at once.

In the steps below I’ll explain the solution step-by-step.

Create a query in Azure Log Analytics Workspace and export to Power BI (M query)

Create your base query by using the Azure Log Analytics workspace. You can read more about creating these queries in this Docs article.
In the image shown below I query the Sign-In Logs and select a few specific properties from it. By default, there are many properties returned which you’ll probably never use in your report, but loaded in every query performed in your dataflow. Therefore, only select the properties you need in this query.

The TimeGenerated statement is used to query just 1 day of Logs, you should do this to since we’ll make this variable in the upcomcing steps.

Click Export > Export to Power BI (M Query). The file downloaded to your pc will look like this.

Save this Query for the next step, it will be used in the custom function together with some variables.

Create a Dataflow and Custom Function in your Power BI Workspace.

Next stop is Power BI Dataflows. We need a few things here that will support the Custom Function querying the API per day.

  1. Create a Blank query and paste the query from step 2 in your Query Editor.
  2. Click the Advanced Editor and change the query so that it matches my query shown in the image below.
    I’ve outlined the variables in green, please note that the TimeGenerated query starts with the “end” variable.
  3. Changing the query into the one shown below will automatically force the query to become a Custom Function. If you want to know more about Custom Functions, I recommend you read this article from radacad. https://radacad.com/custom-functions-made-easy-in-power-bi-desktop

Creating the final Query

The final query contains several steps before calling the custom function. The first query steps create a simple table with “Start” and “End”. The Custom function will be invoked for each row in this table.

  • You can create this table by using the first the first few lines from the total query shown below. I’ve used 90 days, feel free to use any number. But keep in mind that performing large queries might cause the dataflow to exceed the two hours runtime limit. T

Dataflows have runtime limit of two hours. Only a premium capacitiy will increase this limit.

letSource = List.Numbers(0,90),#"Converted to table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),#"Added index" = Table.AddIndexColumn(#"Converted to table", "Index", 1, 1),#"Renamed columns" = Table.RenameColumns(#"Added index", {{"Column1", "StartColumn"}, {"Index", "EndColumn"}}),#"Changed Type" = Table.TransformColumnTypes(#"Renamed columns", {{"StartColumn", Int64.Type}, {"EndColumn", Int64.Type}}),#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type", {{"StartColumn", type text}, {"EndColumn", type text}}),
  • The next step in the query is invoking your custom function. It will run for each row in the table you’ve created earlier.
  • After invocation you will see a new column called “GetLogdays” (or any name you gave your custom function). wThis columns contain the logs for each day you’ve queried. Expanding this column will get you all the logs into your table.
  • Finally, remove the start and end column from your table, at this point you won’t need them anymore.
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type1", "GetLogDays", each GetLogDays([StartColumn], [EndColumn])),#"Expanded GetLogDays 1" = Table.ExpandTableColumn(#"Invoked Custom Function", "GetLogDays", {"TimeGenerated", "UserPrincipalName", "Identity", "AppDisplayName", "Location", "ClientAppUsed", "Status"}, {"TimeGenerated", "UserPrincipalName", "Identity", "AppDisplayName", "Location", "ClientAppUsed", "Status"}),#"Removed columns" = Table.RemoveColumns(#"Expanded GetLogDays 1", {"StartColumn", "EndColumn"}),#"Changed column type" = Table.TransformColumnTypes(#"Removed columns", {{"TimeGenerated", type datetime}})in#"Changed column type"

Save your dataflow | Refresh | Use it!

Once your dataflow is saved, make sure you set a scheduled refresh from the Power BI Dataset settings.

Your dataflow is now ready to be used. You can use it by simply opening Power BI Desktop and select Get Data > Power BI Dataflows. Your data will be loaded into a table and is ready for visualization!

Other solutions

As always, there are multiple ways of reaching the same result. No exception in this case.

I realize that a possibly more robust solution can be created by using other tools like e.g. Azure Data Factory, Datalake Gen2 or maybe even Synapse Analytics.

Depeinding on your own skillset, tools and knowledge a preferred approach comes up. In my case, Dataflows & Custom Functions.

--

--

Rémy Bosman
Wortell
Writer for

BI Consultant | Data Enthousiast | Microsoft