Power BI and Snowflake
Build Scalable BI Solutions Using Power BI and Snowflake
Revolutionise your data capabilities and unlock a new era of data-driven success with Power BI, Snowflake and Version 1 — contact us today.
Introduction
If you are an existing Power BI / Azure Customer you will be pleased to know that Power BI offers native integration with Snowflake’s Cloud Data Platform via the Snowflake Connector. This enables organisations to leverage the Snowflake Cloud Data Platform using Power BI to build scalable BI Solutions.
Both Power BI and Snowflake are both considered market leaders across their respective categories, offering powerful capabilities. It is for that reason it is no surprise more and more organisations are using these tools together!
This post discusses the use of Power BI with Snowflake and some methods of maximising the potential of both tools when used together. Highlighting the following:
- Native Connectivity of Power BI and Snowflake
- Performance Considerations (Reduce the volume of queries being sent to Snowflake)
- Upstream Governance controls
Implementing a Scalable BI Solution with Power BI and Snowflake
With more organisations adopting Power BI for their data visualisation and analysis layer with existing/new Data warehousing solutions, it’s important to carefully consider how to implement the solution to leverage the power of both Snowflake and Power BI.
One of the most important aspects of any BI solution is governance, utilising Snowflake’s governance features such as Column and Row Level security data protection can be implemented upstream. The benefit of implementing governance within Snowflake helps to ensure data is secured downstream in Power BI (Governance controls propagated downstream). Note: When implementing Governance controls in Snowflake it is important that SSO is enabled.
As is common in most BI solutions the majority of Data Transformation should be conducted at the data layer in order to create a loosely coupled architecture (i.e. data layer and visualisation layer are not tightly coupled). Keeping in line with this approach, utilising Snowflake’s data governance controls, governance can also be implemented at the Data Layer (upstream).
Using some of the recommendations outlined below both Snowflake and Power BI can be utilised to build an effective BI Solution.
Snowflake Connector
First and foremost like with most use cases/data sources with Power BI it is recommended to use the native connector. This is the case with Snowflake, utilising this connector will provide the best and most optimised way of querying the Snowflake data source. The Snowflake connector also facilitates the use of Single Sign-On (SSO), this option is discussed later in this article.
Power BI Storage Modes
When planning any Power BI implementation it’s important to consider which storage mode to utilise for Power BI’s Data Model. Each of the storage modes available have Pros and Cons so it is important to understand the needs of your organisation. There are three storage modes available: DirectQuery, Import and Composite.
DirectQuery does not store data in Power BI, Data is not imported. Only metadata is stored in order for the data structure to be defined. Power BI queries the data source directly every time the report is executed.
When working with very large data sets in Snowflake in scenarios where it is crucial to report on the most recent data available, DirectQuery should be adopted.
A Composite model provides the ability to utilise different types of storage modes. DirectQuery and Import connections can be used in one model, in any combination. With a Composite model, some tables can be in import mode and other DirectQuery. This allows you to select the appropriate storage mode depending on the nature of the table.
When utilising Snowflake as a data source within Power BI, a Composite model can provide a nice performance blend and ultimately reduce the number of queries sent to Snowflake when used appropriately.
When using a composite model the standard approach is to utilise:
- Dimension Tables in Import Mode
- Fact Tables in DirectQuery Mode
Data Modelling
Ensure data is modelled as a Star Schema in order to establish proper relationships and a clean data model. A well-structured model design should include tables that are either dimension-type tables or fact-type tables, enabling the dimensions to support filtering/grouping and the fact table to support summarisation. Avoid the use of Flat or denormalized wide (lots of columns) tables, these tables commonly store all the data in one table creating duplication.
When loading data into the data model within Power BI it’s important to only load/query the data necessary for the report/analysis (this helps control the dataset size). Consider the width of tables, Data types of columns and cardinality.
Another aspect of building a data model within Power BI is the use of aggregations. Aggregations can improve query performance when utilising DirectQuery with large volumes of data. Manually configuring aggregations enables data to be cached at the aggregated level in-memory.
Bidirectional Relationships
Consider the use of Bidirectional Relationships carefully, bidirectional relationships apply filters in both directions. The use of bidirectional relationships can negatively impact on model query performance, thus minimising the use of bi-directional relationships is recommended. In the context of Snowflake utilising Bidirectional relationships may result in an increased number of SQL queries being generated and executed against the Snowflake Database.
Number of Visuals
Controlling the number of visuals on a report page is a common method in order to ensure the efficient performance of a report. Using too many visuals on a single report slows report performance. In the context of Snowflake when utilising the Snowflake Power BI connector in DirectQuery Mode the more visuals on the page will result in more queries being sent to the Snowflake Database. Each Visual will produce its own DAX query, resulting in that DAX Queries being translated into SQL and executed against the Snowflake Database.
Query Parallelization
Query Parallelization in Power BI (DirectQuery Mode) can help improve query response times when utilising DAX or MDX expressions. Query Parallelization aims to maximise the performance of a query by parallelizing as many SE queries per DAX or MDX query as possible.
Query Reduction & Referential Integrity
To maximise the performance of Snowflake it’s beneficial to adopt any methods of reducing the volume of SQL queries and the efficiency of those SQL queries. Two methods within Power BI that facilitate this are:
- Query Reduction: When using DirectQuery Mode, Query Reduction can be enabled. A variety of Query Reduction options are available, enabling these options ultimately reduces the number of queries being sent to Snowflake.
- Assume Referential Integrity: Setting Assume Referential Integrity enables queries on the data source to use INNER JOIN statements rather than OUTER JOIN, this ultimately improves the performance and efficiency of the query. This setting is only available when using DirectQuery mode.
Snowflake Data Location
Ensure the location/region of your Snowflake account or location within Azure is as close as possible to your Power BI Tenant/Capacity. This will help to reduce the effects of network performance (latency and bandwidth).
Clustering Keys
A clustering key is a subset of columns in a table, setting a clustering key can help improve response times and query performance. It is important to ensure a clustering key for very large tables is set. Find out more about clustering keys here.
Warehouse — Dedicated for Power BI Reporting / Queries
Utilise a Snowflake warehouse that is dedicated to Power BI queries and ensure it is sized appropriately.
Materialised Views
A materialized view is a pre-computed data set, designed to improve query performance for workloads composed of common and repeated query patterns. Consider the use of Materialised Views to help improve query performance, this is ideal when leveraging DirectQuery mode with Snowflake. Find out more about materialised views here.
Utilising Snowflake SSO
Snowflake has a capability for Microsoft Entra ID and an option for SSO (single sign-on). When a report/data model that utilises the Snowflake connector is published to the Power BI Service, the author has the option to enable the use of SSO. If SSO is not enabled then the report will refresh via the credentials set by the report/data model author as opposed to the report users utilising their own credentials. When deploying a Power BI solution across your organisation, SSO is the best way to ensure a seamless user experience and facilitate data governance. This is an example of how Snowflake’s Governance aspect can be leveraged in Power BI.
Recommendation Summary
Power BI
- Snowflake Connector
- Storage Modes
- Data Modelling
- Number of Visuals
- Query Parallelization
- Query Reduction & Referential Integrity
Power BI offers a number of features that facilitate the implementation of an efficient and scalable BI Solution, when planning an implementation it’s important to consider all of the optimisation tools available. Further Information can be found here: Optimization guide for Power BI.
Snowflake
- Snowflake Data Location
- Clustering Keys
- Warehouse — Dedicated for Power BI Reporting / Queries
- Materialised Views
Additionally, the following resources should be considered when implementing Snowflake as part of a BI Solution:
Summary
Snowflake and Power BI are both very powerful tools offering a number of great features. As highlighted above it is easy to establish a connection between the two tools in order to provide effective data visualisation in Power BI. Combining Snowflakes performance and scalability with the visualisation capability of Power BI could provide your organisation with an effective way of sharing data. Consider both tools as part of your BI Solution.
Revolutionise your data capabilities and unlock a new era of data-driven success with Power BI, Snowflake and Version 1 — contact us today.
About the Author:
Marco is a Data Visualisation Consultant at Version 1, providing data solutions and visualisations expertise to organisations. With a Power BI focus, Marco helps to ensure customers get the best value from their data.