Hacking Power BI lineage: which reports depend on this table?

Marco Santoni
flowe.ita
4 min readJul 18, 2022

--

A diagram illustrating the concept of data lineage in a business intelligence platform with Power BI
When we change the schema or the contents of a table, which reports are impacted?

Most Data Analytics or BI teams publishing reports or dashboards within their company in Power BI start with a situation like this…

A brand new Power BI workspace with just one report

… and end up in a workspace like this:

A Power BI workspace full of reports

You may end up with many Power BI reports each making use of different tables or data sources. Then, one day, a developer comes and says:

“I might change the type of SubscriptionType column in Users table from Int to String. Which reports are impacted by such change?”

When the number of reports in a Power BI workspace is getting large, it gets complex answering such a simple question. How many reports depend on Users table? Do we need to open each report and check the Transform Data section? That is extremely time consuming and prone to manual mistakes. Can we automate such task?

Limits of Power BI Lineage

Power BI has a section dedicated to Data Lineage. This section summarizes

  • which reports depend on which datasets
  • which datasets depend on which data sources (eg SQL database, CSV files, etc)
An example of native lineage in Power BI

Unfortunately, this section cannot give us the answer we are looking for. We read for example which SQL database a report depends on. However, we cannot read if a certain report depends on a specific SQL table inside that database.

Exploiting Power BI Admin API

This is the moment when we need to do some hacking and start exploiting the Power BI APIs via Powershell (you can use any client of your choice of course to call the API). Please, note that you need an admin role on Power BI to make such API calls. First, install the cmdlet MicrosoftPowerBIMgmt (see docs).

Install-Module -Name MicrosoftPowerBIMgmt

The first step is to authenticate our Powershell session via

Login-PowerBIServiceAccount

We now want to call the Power API endpoint (see docs) that gets the metadata of a workspace. The route of the API is admin/workspaces/getInfo. We will set the following values for the optional query parameters

admin/workspaces/getInfo?lineage=True&datasourceDetails=True&datasetSchema=True&datasetExpressions=True

The body of the HTTP call to the API needs a workspaces field as an array of strings. Each element of the array is the GUID of a workspace we want to get the metadata for, example

"workspaces": ["c56a4180-65aa-42ec-a945-5fd21dec0538"]

So, this Powershell script will get a token to authenticate to the API. It will make a first call to start the scanning.

# these are the Power BI API endpoints we're going to call
$startScanUrl = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo?lineage=True&datasourceDetails=True&datasetSchema=True&datasetExpressions=True"
$statusUrl = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/scanStatus/"
$getScanInfoUrl = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/scanResult/"
# getting the token for authentication
Login-PowerBIServiceAccount
$headers = Get-PowerBIAccessToken
$body = "{'workspaces': ['c56a4180-65aa-42ec-a945-5fd21dec0538']}"
$headers.Add('Content-Type', 'application/json')
# http call to trigger the scan
$startScan = Invoke-RestMethod -Method Post -Uri $startScanUrl -Body $body -Headers $headers

Now, the scan of the workspace metadata just started. We want to wait for it to complete. We can use the /admin/workspaces/scanStatus/ route to check the status of the request.

### Wait for Scan to Succeed
$headers.Remove('Content-Type')
$statusUrl = $statusUrl + $startScan.id
do {
$statusScan = Invoke-RestMethod -Method Get -Uri $statusUrl -Headers $headers
Start-Sleep -Seconds 3
}
while ($statusScan.status -ne "Succeeded")

Once the scan is completed, we can actually collect the results. Again, we call a Power BI API with the route /admin/workspaces/scanResult/. We’ll save the response in a results.json file.

# once the scan is done, we can store the results
$statusUrl = $getScanInfoUrl + $startScan.id
$infoResults = Invoke-RestMethod -Method Get -Uri $statusUrl -Headers $headers
$infoResults.workspaces | ConvertTo-Json -Depth 10 | Out-File -FilePath .\results.json

What does the response look like? Let’s look into our results.json file

[
{
"id": "c56a4180-65aa-42ec-a945-5fd21dec0538",
"name": "OurPBIWorkspace",
"description": "Awsome BI reports",
"type": "Workspace",
"state": "Active",
"isOnDedicatedCapacity": false,
"reports": [...]
"dashboards": [...]
"datasets": [...]
"dataflows": [...]
"datamarts": [...]
}
]

The response is an array of WorkspaceInfo objects describing the workspace and the metadata collected during the scan. The datsets field is an array of WorkspaceInfoDatasets objects. It lists each Dataset in the Power BI workspace with metadata like name, creation date, etc. Why is it important for data lineage?

The WorkspaceInfoDatasets object has a tables field that contains an array of the tables part of the dataset. For each table, the source.expression field is what we are interested in. This field contains the full Power Query M expression to load and transform the data. Example:

"source": [
{
"expression": "let\n Source = myserver.database.windows.net\", \"mydatabase\", [Query=\"SELECT COUNT(DISTINCT UserId) CountOfUsers)FROM Users\"])\nin\n Source"
}
]

Why is this relevant at all? Do you remember the original question from our developer?

“I might change the type of SubscriptionType column in Users table from Int to String. Which reports are impacted by such change?”

By making a simple CTRL+F search on this JSON file, we can quickly find the Power BI datasets where the Users table is used. It can save us plenty of time of manual research throughout the .pbix files of every report.

Do you want to hear more about Flowe? Look at all the open positions!

--

--

Marco Santoni
flowe.ita

I am passionate about data science and software engineering, and I am lucky to have turned them into my daily work.