Snowflake + Power BI: Evaluating Query Parallelization

Back in March, Power BI unveiled a public preview of query parallelization improvements for DirectQuery mode that significantly boost performance in certain scenarios. This blog post evaluates one such scenario, using time intelligence measures, and how it benefits from this enhancement. Spoiler: we saw just shy of a 5x performance gain while putting query parallelization to the test against Snowflake.

Intro
Last year, the Snowflake team worked hand-in-hand with the Power BI team to review Best Practices for Using Power BI in DirectQuery Mode with Snowflake. Snowflake also took Power BI’s Horizontal Fusion feature for a spin using sample data available in every Snowflake account. In this post, we take a closer look at Power BI’s latest major DirectQuery improvement, Query Parallelization (MaxParallelismPerQuery). This is an exciting new feature for our joint customers. At Snowflake, we’ve already received positive feedback from a number of customers, including one that stated:

We saw a 6–8x speed increase during testing…highly recommend for DirectQuery datasets.

Additional Context
Members of the Power BI team at Microsoft have written about this enhancement in detail, and I encourage you to check out their content. Chris Webb’s blog post (linked below) does a fantastic job illustrating how common Power BI external tools, like Tabular Editor and DAX Studio, can be used to both easily enable query parallelization, as well as test it. The same steps outlined by Chris in his blog were followed for this post.

Power BI Affiliated Blogs:
Query parallelization helps to boost Power BI dataset performance in DirectQuery mode (Official Power BI Blog)

DirectQuery Parallelisation In Power BI — Some Examples (Chris Webb — Power BI CAT)

NOTE: This feature is in public preview, thus why we recommend following Chris’ blog for enabling this on your DirectQuery datasets today. The Power BI team has stated the following about expected behavior once generally available.

At General Availability (GA), we plan to change the default value for the Model.MaxParallelismPerQuery parameter to 0, for Automatic, to let Power BI choose the right degree of SE query parallelization based on the CPU configuration of the Premium SKUs, so you don’t have to. At that time, you no longer need to set this parameter manually to benefit from SE query parallelization.

The Problem
Historically, certain Power BI operations have performed poorly in DirectQuery mode. For instance, common scenarios like leveraging Power BI’s time intelligence measures (e.g., YTD vs PYTD) or having a visual that contains measures from multiple fact tables (e.g., Budgets vs Actuals) would result in a single Power BI visual generating many SQL statements under the hood. Amplifying this problem, the Power BI engine would send these queries sequentially, leading to longer wait times for report users.

The Solution
Altering Power BI’s query parallelization setting won’t reduce the number of SQL queries generated, but it will reduce the “chattiness” to Snowflake by sending the queries concurrently as opposed to sequentially. Eliminating these roundtrips, in tandem with Snowflake’s multi-clustered warehouses, gives your Power BI queries the horsepower needed to avoid resource contention and queuing.

Tools Used

  • Power BI Desktop
  • Power BI Premium Workspace (Premium per User used in this test)
  • Tabular Editor
  • DAX Studio
  • Snowflake Account

The Test

  • Two Power BI DirectQuery datasets were created using Snowflake’s Sample dataset, Tasty Bytes
  • The datasets were published to a Premium per User (PPU) workspace
  • The Power BI datasets are identical, except one has the query parallelization setting enabled, and the other does not (MaxParallelismPerQuery default value of 1). The PPU Workspace Connection was used by Tabular Editor to update the setting
  • The same Power BI report page was used for each test, with an emphasis on the table visual that contains the time intelligence measures
  • The table visual contains the following fields and measures: Order Amt, YTD Order Amt, PYTD Order Amt, and YoY Growth % sliced by Year and Month

The four measures were created with the DAX below:

Order Amt = SUM('Order Details'[LINE_TOTAL])
Order Amt YTD = TOTALYTD([Order Amt], 'Date'[DATE])
Order Amt PYTD = TOTALYTD([Order Amt], SAMEPERIODLASTYEAR('Date'[DATE]))
YoY Order Amt Growth = DIVIDE([Order Amt YTD] - [Order Amt PYTD], 'Order Details'[Order Amt PYTD])
  • The report page contains two slicers which are filtering the results to Years 2019, 2020, 2021, and 2022 and Brand = Nani’s Kitchen
  • These queries were pre-executed, so both tests will hit Snowflake’s results cache, thus minimizing Snowflake’s impact on the results

NOTE: The primary fact table in the dataset contains 650M+ records, but thanks to Snowflake’s results cache our queries evaluate in milliseconds without even using Virtual Warehouse compute. In other words, this test didn’t incur any compute costs!

  • Using Power BI Desktop’s Performance Analyzer, we are able to capture the DAX Query generated by Power BI
  • Lastly, with the DAX Query in hand, we can run server timing traces against our Power BI datasets using DAX Studio, clearing the cache every time. The same PPU Workspace Connection was used to connect to our published datasets from DAX Studio
The DAX Query for a visual can be retrieved using the Performance Analyzer.
The DAX Studio query cache was flushed with every execution

Without Query Parallelization

  • Total duration = 4.8 seconds
  • In the first screenshot below, notice how the Power BI Engine is sending the queries sequentially
  • In Snowflake’s Query History we can see the start times for each query are in fact staggered
Using the Power BI Premium workspace connection, we can connect to our published dataset from DAX Studio and run a server timing trace to analyze how the tabular engine is generating and issuing SQL queries.
Using Snowflake’s Query History, we can see the SQL queries generated by the Power BI dataset’s tabular engine.

With Query Parallelization

  • Total duration < 1 second
  • In the first screenshot below, notice how the Power BI Engine is now sending the queries all at once
  • Snowflake’s Query History confirms all the queries were started at approximately the same time
This time we’re connecting to the other Power BI dataset where the MaxParallelismPerQuery property was updated to a value of six.
The SQL statements generated by Power BI all started simultaneously within Snowflake.

Conclusion
That’s all there really is to it! There was nothing we changed within Snowflake, nor within our Power BI datasets other than increasing the MaxParallelismPerQuery setting. Simply by increasing this setting, the runtime for the query behind our time intelligence table visual dropped from 4.8 seconds down to 1 second. That’s almost a 5x improvement without touching any code! Follow Chris Webb’s post to Enable query parallelization today and unlock better performance with your DirectQuery datasets.

--

--

Josh Crittenden
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Sales Engineer @ Snowflake | Former DW & Power BI Architect | Husband | Father | Coffee Addict | Viewpoints are my own