Snowflake and Power BI: Best Practices and Recent Improvements

Approaches to maximize performance

last updated July 29, 2024 — come back often for the latest

Introduction

The Snowflake Partner Network unlocks the potential of the Data Cloud with a broad array of tools and partners. Our certified partnerships and integrations enable customers to leverage Snowflake’s flexibility, performance, and ease of use to deliver more meaningful data insights.

Snowflake continues to invest in its partnership with Microsoft Azure, including ongoing improvements to our integration with Microsoft Power BI. As a Leader in the Gartner BI Magic Quadrant for over a decade, Power BI remains one of the most popular used business intelligence partner tools that Snowflake customers use to visualize, analyze and report on their most important business data. I’m excited to summarize some of our recent improvements and known best practices below. Before getting specifically into how to best use these two platforms together, let’s take a quick look at each one individually.

Snowflake Optimizations (regardless of BI tool)

A comprehensive list of Snowflake performance and optimization approaches is beyond the scope of this article. (A simple Google search on this topic will return plenty of results!) As mentioned in Snowflake’s documentation, the following topics help guide efforts to improve the performance of Snowflake:

Snowflake offers an abundance of resources to help customers better manage and optimize their various workloads. For customers looking to improve Power BI performance on top of Snowflake, I always recommend individually profiling several of the queries being sent to Snowflake from a single Power BI report or dashboard. If these queries are having trouble in Snowflake, they aren’t going to run any faster in Power BI (or any other BI tool!). Work with your Snowflake account team, engage Snowflake Professional Services, and/or review the content above. Eliminate anything that might be happening on the Snowflake side of the equation.

Optimization guide for Power BI (regardless of source database)

Similarly, a comprehensive list of Power BI performance and optimization steps is beyond the scope of this article. Microsoft also has plenty of resources dedicated to this topic, including:

Before continuing, it’s worth taking a look at which version of Power BI you’re using within your organization (ex: Pro vs Premium), as well as your total capacity and Azure SKUs. Some of the items mentioned below, are only available in Premium versions of Power BI.

Snowflake & Power BI Best Practices

OK, now that you’ve achieved basic familiarity with optimizing Snowflake, and optimizing Power BI (separately), we hope you’ll find the below resources helpful as you look to maximize performance when using these two platforms together.

  • Connecting to Snowflake from Power BI: Snowflake recommends using the Native Snowflake connector from Power BI Desktop and Power BI Service (vs. configuring the generic ODBC driver to connect to Snowflake). Snowflake and Microsoft continue to work closely to optimize this connector, including these recent improvements:
    July 2023: usage of SQLBindCol for faster imports
    October 2023: better implementation of “LIMIT 1” queries (when using custom SQL)
    December 2023: enabled cancellation of a Snowflake query when the operation is cancelled in the Power BI user interface
    March 2024: added Snowflake icon (thanks Microsoft! 😊 ), as well as native support for Snowflake in the Power BI Report Builder (Preview)
    April 2024: added native support for Snowflake Dynamic Tables (which materialize the results of a query that you specify — this can be a great alternative to using Custom SQL in your database connection. More on Custom SQL below).
    – Looking forward, we have much more in store and will continue adding to this list as new updates get announced!!! All of these updates are pushed automatically into the Power BI Service, but don’t forget to upgrade your Power BI Desktop and Gateway installations as well
  • Power BI Storage Modes: Use careful consideration when selecting a storage mode. For example, deciding between Import Mode (copying all data into Power BI), DirectQuery Mode (querying Snowflake directly every time a report is run), or Dual Mode (tables with this setting can act as both cached and not cached). Each approach may have its own benefits as well as certain limitations. Discuss this internally as a team when deciding which mode(s) meet your business requirements. For teams working with extremely large data sets with requirements to analyze the most current data available, Snowflake recommends DirectQuery mode. Similarly, many Snowflake customers have found Composite Models to be a nice blend, allowing you to import dimensions into memory (for fast filters and slicing), and querying large fact tables using DirectQuery.
    – Where appropriate, enable large semantic models and select large semantic model storage format
    DirectQuery models may need special consideration. We recommend spending adequate time to fully consider all of the guidance provided by Microsoft, and troubleshooting as needed (When asking for assistance, both Microsoft and Snowflake may inquire how much time you have spent using Performance Analyzer to examine individual report elements. Log Analytics can also be incredibly helpful.)

– When configuring scheduled data refreshes for semantic models using Import Mode, many customers have found it helpful to use dedicated Snowflake virtual warehouses, to avoid resource contention with other workloads, users and use cases. (Larger warehouses can be extremely beneficial for large data extracts out of Snowflake, for faster micropartition scanning from remote cloud storage. These warehouses should be automatically suspended after the Import Mode queries are completed, to avoid unnecessary credit consumption)

  • Data Model Design: Despite already being mentioned already above, it’s worth repeating here: How you model your data is one of the most important factors determining how well Power BI works and, unlike many other BI tools, Power BI is optimized for working with dimensional models and star schemas rather than flat, wide tables. We have seen too many customers skip this important step of building a quality data model within Snowflake. Snowflake also recommends trying to avoid using Power BI features such as the Power Query Editor or creating calculated columns to work around any deficiencies in the way your data is modeled: all of your data modeling work should be done in Snowflake before you do any work in Power BI.
    – Don’t forget to play close attention to SQL data types within your source tables, especially numeric data types (excessive precision isn’t always needed for analytical reporting & dashboards). Some customers have noticed improved Power BI performance when using INTEGER data types, and have benefited from enabling the internal Snowflake ODBC_TREAT_DECIMAL_AS_INT setting = TRUE for specific users & sessions. (also available at the account level, for which careful consideration should be given)
  • Networking and Connectivity: In order to reduce the effects of network latency and bandwidth, it is important to consider the location of your Power BI tenant or Premium capacity relative to the location of your Snowflake account. Additionally if you are using Snowflake on Azure, try to store your Snowflake data in the same Azure Region as your Power BI tenant’s default region.
  • When to use Power BI Data Gateways These may be required when your data sources are private (for example running Snowflake on AWS or Azure Private Link). Gateway software runs on a Windows VM inside the customer’s network and can be clustered to provide High Availability. These VMs have to be managed with the rest of the infrastructure for updates and patches. We have seen several customer examples where the documentation has not been fully considered, and they under-configure their PowerBI gateways. That leads to congestion and slow performance. Streaming data to optimize performance is another setting that should be considered. Their network firewalls and proxies can also add overhead. DON’T FORGET TO KEEP YOUR POWER BI GATEWAY SOFTWARE CURRENT.
  • When to use Power BI VNET Data Gateways Managing Onprem Gateway VMs for OS and Software updates, clustering etc can be a challenge and the new VNet Data Gateways addresses that issue as its a managed service injected into a dedicated subnet. These can be clustered to increase availability and throughput.
  • Concurrency and Parallelism: Several query parallelization improvements have been introduced for DirectQuery mode that significantly boost performance in certain scenarios. You have already invested in building your business intelligence, analytics and data warehouse strategy on Snowflake — don’t let the wrong Power BI settings slow you down here!! The fundamental idea is to maximize query performance by parallelizing as many queries per DAX or MDX query as possible. This query parallelization reduces the impact of data source delays and network latencies on query performance. Increasing the number of connections can make a noticeable difference to performance when you have a report page with many visuals on it or when some visuals generate slow queries. Query Parallelization won’t reduce the number of SQL queries Power BI generates, but it will reduce the number of roundtrips by issuing queries in parallel as opposed to issuing queries sequentially.
    Query Parallelization is now GA and the default parallelization with be dictated by the size of your Capacity SKU. The default can be overridden by following the guidance in the post.
  • Max rowset counts: If you are using Power BI Premium it may be necessary to change some workload settings on your capacity that are relevant to datasets. In particular the “Max intermediate rowset count” setting controls the maximum number of rows that can be returned by a SQL query in DirectQuery mode; you may encounter this limit when you have modeled many-to-many relationships in your dataset or are writing complex DAX expressions, but rather than increasing the value of this setting (which may just postpone when you hit the error, or cause other performance problems) you should try to remodel your data or rewrite your DAX expression to work around the problem.
  • Single Sign-On (SSO): Enabling SSO on the connection between Power BI and Snowflake means that the identity of the user running the report is also used to send the SQL queries generated by Power BI back to Snowflake. Enabling SSO means that Power BI reports respect any security rules you have defined in Snowflake. See the great companion article on this topic, Exploring Snowflake Data Governance with Power BI.
  • Aggregations: Power BI’s aggregations feature can be used to add tables containing pre-aggregated data to your dataset, which the Power BI engine can then use to improve query performance. Aggregations can be stored in Import mode or DirectQuery mode. Implementing hidden aggregate tables to your Power BI semantic models will let Power BI “direct traffic” and route queries to the appropriate table as-needed. DirectQuery mode aggregation tables can be used to solve many of the same problems as Snowflake materialized views so you may want to test the use of each to see which one gives you the most performance benefits. Power BI automatic aggregations can also be used with Snowflake, although not when SSO is enabled.
  • Limit the number of visuals on the page: Each visual on a report page will generate its own DAX query (yes, even slicers). When using DirectQuery mode, these DAX queries also get translated to SQL and executed against the underlying Snowflake database. For these reasons, it’s imperative to be thoughtful with the design of each report page. Keep the dashboard or visualization simple by limiting the number of data points, visuals and queries in a page to a minimum. This will also help query performance as there will be lesser number of queries to run while refreshing the visualization. The reports will be more easily readable and faster. Remember the popular “Visual Information Seeking Mantra”: “Overview first, zoom and filter, then details-on-demand.”
  • Use Query Reduction options to limit the number of queries generated. This is especially helpful when using slicers in your visualization, where you only want the filters applied when the “Apply” button is used.
  • Disable “Include Relationship Columns”. On occasion, getting additional table metadata (in this case, columns that might have relationships to other tables) may cause additional, significant delays. We recommend clearing this box within Power Query Advanced Options.
  • Use the Assume referential integrity property on relationships. While the default property on a relationship in Power BI is to generate a left outer join, by using the “Assume referential integrity” property, you can force an inner join. This can make the queries faster. (This property is available only in a Direct Query mode) If you are confident in the data quality of your Snowflake tables, then enabling this feature will allow Power BI to generate more efficient SQL queries to retrieve data.
  • Bi-Directional Filters on Relationships Use these with discretion!! More bi-directional filters mean you will generate more SQL queries. These can increase the complexity of the model. Often times, developers new to Power BI modeling will use bi-directional filters without fully understanding the impact. Most of the time relationships between tables will be one-to-many, especially when considering dimensions and facts.
  • Dataset Size: Only include the tables, rows and columns necessary to support your business needs. More data is not always the answer :). Any columns that are not absolutely necessary should be removed as they take up memory, increase complexity, and potentially increases data volume which decreases performance.
  • Metadata Calls Scope the user’s default role to see only what is absolutely necessary. Why? Upon initial data load, metadata calls can take an excessive amount of time — time that has a direct relationship to the number of objects in an account. When creating a connection from Power BI to Snowflake you should always connect using an identity that can only access the tables or views needed for your dataset. This will usually involve specifying a Snowflake role and database in the advanced options section of the connection dialog
  • Custom SQL vs. Native Query Although the Snowflake connector gives you the option of using your own SQL query as the source for a table in your dataset, Snowflake recommends that you do not do this. It can lead to performance problems while developing in Power BI Desktop and it also can make your solution harder to maintain since it hard-codes SQL inside your dataset definition. (how many different SQL queries are being used within your organization, to answer the same question? I would venture to guess more than you think :). Going back to the point about modeling your data properly before you start developing in Power BI, you should always try to use properly modeled tables or views as the source for your Power BI dataset, rather than ad-hoc SQL. NOTE: THE OCTOBER 2023 RELEASE OF POWER BI CONTAINS AN IMPORTANT UPDATE THAT IMPROVES “CUSTOM SQL” PERFORMANCE ON SNOWFLAKE. ALL INSTALLATIONS OF OF POWER BI DESKTOP AND/OR GATEWAY SHOULD BE UPGRADED TO TAKE ADVANTAGE OF THIS IMPROVEMENT (THIS IS AUTOMATICALLY INCLUDED IN THE POWER BI SERVICE)

Additional Resources

Snowflake BI Analytics Bootcamp: For a demo showcasing several of the above-mentioned best practices applied live within Microsoft Power BI, check out the “BI Analytics for Sales” video.

Best Practices For Using Power BI In DirectQuery Mode With Snowflake

Snowflake + Power BI: Evaluating Query Parallelization

Power BI — Snowflake Performance Enhancements with Horizontal Fusion (more for informational purposes, as Horizontal Fusion is now GA)

Exploring Snowflake Data Governance with Power BI

New Limits For The “Maximum Connections Per Data Source” Property In Power BI DirectQuery Mode

DirectQuery Parallelisation In Power BI — Some Examples

Multiple Connections To The Same Data Source In The Power BI Service With Shareable Cloud Connections

The Snowflake and Microsoft Partnership

Conclusion

Thousands of joint customers are successfully using Microsoft Power BI while simultaneously enjoying all of the benefits of the Snowflake Data Cloud, including:

  • A Single, Fully Managed Solution where Snowflake handles maintenance, administration, and a host of other automated services so you don’t have to: a single, global platform powering your essential workloads. One platform optimizes cost and performance, and minimizes TCO by saving time, manual effort, and headcount meaning you get faster time to insights and to production.
  • Near Unlimited Resources with Snowflake’s multi-cluster, shared data architecture, enabling customers to elastically scale up and down, automatically or on the fly, within seconds. Never worry about resource contention again, and achieve optimal performance for any workload.
  • Governed and Secure Collaboration with security and governance features that were baked into the platform from day one, enabling you to both share and access governed data, tools, applications, other technologies, and data services– while preserving privacy.

Get the performance, flexibility, and scalability you need to load, integrate, analyze, and share your data — securely. As a fully managed service, Snowflake is easy to use, yet powerful enough to run your essential workloads with near-unlimited concurrency.

We hope the information above proves helpful on your journey with Power BI and Snowflake. We highly recommend saving this article for future reference and sharing within your organization. We’ll keep it updated as additional improvements are introduced.

What have we missed? What’s on your wishlist? Be sure to let us know in the comments!!!

--

--