Exploring partitions, incremental refresh, detect data changes, and custom polling expressions in Power BI semantic models: a brief overview and evaluation of their value

Published in

Microsoft Power BI

5 min readJun 7, 2024

If you’re working with a small-volume Power BI semantic model, feel free to skip to the next article. However, if your model is large or if you’re seeking optimizations for data refresh efficiency, investing a few minutes to learn some techniques can potentially save you significant time and cost in the long run.

Partitions

Let’s talk about why we care about partitioning the data loaded into big fact tables into smaller chunks in Power BI. First of all, Power BI has this awesome feature that allows it to run multiple queries in parallel (If I recall correctly, by default, Power BI Premium can refresh a maximum of six partitions in parallel). This parallelization can significantly speed up the data model refresh process or, in some cases, it can be the only way to get the refresh done at all.

Here’s a real-life example I encountered: I was working with a model that pulled data from Databricks metastore tables. However, I ran into an error message: “Token expired while fetching results: TEAuthTokenExpired.” This error occurred because the token used for authentication has a lifetime of one hour, and if a command takes longer than that, it fails.

To resolve this issue, I decided to split the fact table into monthly partitions. This partitioning allowed me to successfully pull the data because each individual query took less time to execute. By reducing the query time, I ensured that the refresh completed within the valid token lifetime.

Incremental refresh

After I fixed the problem with the semantic model refresh, I started thinking about whether it’s actually necessary to go through millions of rows of data when only a few months’ worth of data change overnight. If that’s really the case, using incremental refresh can be a huge help. It lets us update only the relevant part of the fact data, while keeping the rest of the data (which is usually bigger) archived. This way, we can save time and resources by only focusing on the necessary updates.

However, there are a couple of situations where incremental refresh may not be enough:

Predicting time periods: Typically, we have to make an educated guess about the number of time periods that should be refreshed. Since there’s no certainty, we often end up extending the refresh time beyond what we actually need. Even though incremental refresh helps by not pulling the entire history, we still end up usually fetching more data than necessary.
Full history refresh: Sometimes, we have to refresh the entire history. This could be due to making changes to past data or “extending” the data by, for instances, new categories. In such cases, scheduled incremental refresh won’t be very helpful. Instead, we would need to manually trigger a full table refresh for all partitions using third-party or external tools like (my beloved 💗) Tabular Editor 3 or SSMS. This means involving human interaction and potentially causing delays in the entire historical data refresh process, unless there is a reliable flow of information that allows us to precisely determine when the full history refresh should be performed.

In these cases, incremental refresh alone may not be sufficient, and we may need to consider some additional steps.

Detect data changes

This is where the “Detect data changes” property of incremental refresh comes to the rescue. This feature enables us to define specific checks that help Power BI determine whether any changes have occurred within a given time partition since the last refresh. By utilizing the “Detect data changes” property, Power BI can intelligently identify and refresh only the data that has undergone changes, resulting in significant time and resource savings. This capability enables you to extend the incremental refresh periods to cover the entire historical data, eliminating the need for ad hoc manual whole history refreshes. As a result, you can optimize your data refresh process and ensure that only the necessary data is updated, enhancing efficiency and reducing unnecessary workload.

By selecting a specific date/datetime column in the data source, you can identify and refresh only those partitions where the data has changed. It’s important to note that this column should be separate from the one used for partitioning the data with the RangeStart and RangeEnd parameters.

During each incremental refresh, Power BI evaluates the maximum value of this last update column for each time partition’ in the incremental range. If the maximum value hasn’t changed since the last refresh, there’s no need to refresh that particular period.

The default polling expression for the detect data changes looks like below.

While the default polling expression for detecting data changes in Power BI is convenient, it does have some drawbacks. Firstly, to utilize this feature, you need to load the update date column to the fact table solely for the purpose of detecting changes. This can significantly increase the size of the fact table since the column may not serve any other practical use. Secondly, if a record is deleted within a specific partition, that partition won’t be refreshed because the maximum update date column value for the partition remains unchanged.

Custom polling expression

Indeed, there is an alternative solution to address the limitations of the default polling expression, and it is known — you would never guess 😅— as “custom polling expressions”. These expressions provide a way to further optimize the data refresh process in Power BI.

Custom polling expressions allow you to define your own query logic to determine whether data changes have occurred. This gives you more flexibility and control over the refresh process. By crafting a custom query that suits your specific data source and requirements, you can improve the efficiency and accuracy of detecting data changes.

I believe the only limitations for custom polling expressions in Power BI are that (1) they need to be based on the same data source as the table you are refreshing incrementally, (2) the result of the expression has to be a single value for each partition and (3) you cannot set them up directly in Power BI — for that you’d need some external tools (I would as always recommend Tabular Editor 3).

In my real-life example, the custom polling expression was constructed based on the summation of the sales amount.

However, when it comes to defining the polling expression, there are virtually no limitations. Sky is the limit. Or stakeholders needs 😁 There is a great article written by Chris Webb (“Custom Queries For “Detect Data Changes” In Power BI Incremental Refresh”) covering an example of the custom polling expressions.

Don’t forget to subscribe to
👉 Power BI Publication
👉 Power BI Newsletter
and join our Power BI community
👉 Power BI Masterclass

Exploring partitions, incremental refresh, detect data changes, and custom polling expressions in Power BI semantic models: a brief overview and evaluation of their value

Partitions

Incremental refresh

Detect data changes

Custom polling expression

Written by Mateusz Mossakowski