Power BI on DBSQL Design Patterns

Databricks SQL SME
DBSQL SME Engineering
7 min readJul 18, 2024
DBSQL + Power BI Lakehouse

Authors

Andrey Mirskiy, Senior Specialist Solutions Architect @ Databricks
Yatish Anand, Solutions Architect @ Databricks

Introduction

In today’s data-driven world, harnessing the full potential of your data is critical for making informed business decisions. Power BI and Databricks SQL are two powerful tools that, when integrated, can transform raw data into actionable insights. Power BI, with its robust visualization capabilities, and Databricks SQL, known for its scalable and efficient querying of big data, together create a formidable data analysis and reporting ecosystem.

Integrating these tools effectively can streamline your data workflows, enhance your data analytics capabilities, and provide deeper insights into your business operations. However, to unlock their full potential, it’s essential to follow best practices that ensure seamless integration, optimal performance, and maximum value extraction.

To help our customers we have developed a Databricks SQL / Power BI QuickStart Samples repository demonstrating some of the best practices implementing efficient and high performance Power BI semantic models on Databricks SQL.

Repository Overview

Prerequisites

In order to use our quickstart samples you will need the following prerequisites.

  1. You should have access to Power BI Desktop to access the pbit-files.
  2. Access to Databricks SQL Warehouse. We recommend using Databricks SQL Serverless because it provides the best experience for BI workloads. Please refer to this blog to understand the performance benefits of DBSQL Serverless over other DBSQL SKUs.
  3. Power BI Premium workspace to access published reports and semantic models.
  4. To access some features you may need Tabular Editor or SSMS to access Power BI XMLA endpoint.

Design Principles

We created the repository with the following design principles in mind:

  • Open — the code is open and can be easily reused. Every quickstart sample is provided as Power BI template file (pbit) and, if needed, SQL-scripts. Power BI template files do not include any credentials or data, thus they enable easy reproducibility while preserving data privacy.
  • Simple — all samples include step-by-step instructions and are based on samples catalog tpch schema which exists in every Databricks workspace, thus no need to create any test data. Apart from that every quickstart sample is focused on only one best practice which avoids unnecessary complexity.

Multi-cloud — Databricks is available on all 3 major hyperscalers incl. Azure, AWS, and GCP. Through native connectors Databricks provides first-class integration with Power BI. Thus, all the quickstart samples are equally valid and relevant when implementing Power BI on Azure Databricks, Databricks on AWS or GCP.

Samples Overview

Connection Parameters

Connection parameters in Power BI are essential for configuring and managing dynamic connections to data sources, allowing for greater flexibility and efficiency in report development. These parameters enable report developers to define variables that can be used to control various aspects of data source connections, such as server names, catalog names, or query filters. By utilizing connection parameters, organizations can simplify the process of switching between different environments (e.g., development, testing, production) without manually updating each semantic model. This feature ensures consistency and accuracy across reports while reducing the risk of errors.

In this quickstart sample we showcase how we can streamlining the process of connecting to different Databricks SQL Warehouses by using parameters in Power BI. The guide also explains what are the different parameters needed (e.g hostname ) to connect with Databricks SQL.

Image 1 — Parameters in Power BI Desktop

DirectQuery-Dual-Import

Storage Modes are a really important aspect of Power BI. While the majority of Power BI developers start with Import mode, using DirectQuery mode can unlock the potential of Databricks Lakehouse by leveraging the power of Databricks SQL to query virtually unlimited data volumes. Obviously DirectQuery mode comes with some latency penalty which some developers may try to fix by switching some of the tables to Import mode.

In this quickstart sample we compare the behavior of Power BI when using DirectQuery, Import, and Dual modes for dimension tables. We showcase the implications of such a design decision, and discuss the benefits of using Dual storage mode which combines the best of Import and DirectQuery storage modes for dimension tables.

Image 2 — Comparison of DirectQuery, Import, and Dual storage modes
Recommendations for storage mode based on table type

Logical Partitioning

Logical partitioning in Power BI is a powerful technique used to enhance the performance and manageability of large datasets in Import storage mode. By dividing data into smaller, more manageable segments, known as partitions, Power BI can efficiently process and query data, resulting in faster semantic model refresh times and improved overall performance. Logical partitioning allows for incremental data refreshes, meaning only new or changed data needs to be updated, and each partition can be processed separately thereby significantly reducing refresh times and resource consumption. This method is particularly beneficial for large-scale datasets commonly found in enterprise environments, where timely data analysis is critical.

In this quickstart sample we showcase the benefits of Logical Partitioning by creating a sample report on orders table. The first screenshot shows the default non-partitioned orders table where the entire dataset with 7.5M rows is refreshed as a single partition. The second screenshot shows how the order table is partitioned based on o_orderpriority column and each partition processes its own rows in parallel thereby reducing the refresh time of the dataset.

Image 3 — Non Partitioned Table
Image 4 — Partitioned Table

Query Parallelization

When using Databricks Data Intelligence Platform, one of the greatest benefits of using Power BI is that we can build reports which process hundreds gigabytes or even terabytes of data. That is possible when using DirectQuery mode which uses Databricks SQL as the backend and does not require loading all data into in-memory cache. Therefore, in order to achieve optimal performance it’s important to understand how Power BI generates and executes SQL-query when using DirectQuery mode.

In this quickstart sample we explore the impact of Power BI SKU and how fine tuning of query parallelization can help improve overall report performance.

Image 5 — Example of query parallelization on P4/A7 SKU

User-defined Aggregations

User-defined Aggregations in Power BI stands out as a game-changer for handling large datasets and complex queries over DirectQuery semantic models. By pre-computing and storing summarized data, such as totals, averages, and counts, in memory Power BI can deliver faster query responses and improved report performance. This technique enables users to interact seamlessly with high-level summaries while maintaining the flexibility to drill down into detailed data. Whether you’re working with massive datasets or seeking to enhance your report’s responsiveness, user-defined aggregations provide a powerful solution.

In this quickstart sample we compare the behavior of Power BI when using DirectQuery table vs using User-defined Aggregation table and showcase the performance improvement we achieve while using User-defined Aggregations.

Image 6 — Configuring User-defined Aggregation table

Dynamic M Query Parameters

With dynamic M query parameters, model authors can configure the filter or slicer values that report viewers can use as an M query parameter. Dynamic M query parameters give model authors more control over the filter selections to incorporate into Direct Query source queries. With dynamic M query parameters, model authors can ensure that filter selections are incorporated into source queries at the right point to achieve the intended results with optimum performance. Dynamic M query parameters can be especially useful for query performance optimizations when querying huge tables.

In this quickstart sample we delve into this feature and discuss how to configure Dynamic M Query Parameters step by step using simple easy-to-digest example.

Image 7 — Dynamic M Query Parameters

Conclusion

In conclusion, integrating Power BI with Databricks SQL can significantly enhance your data analytics capabilities by transforming raw data into actionable insights. By following best practices and utilizing the Databricks SQL / Power BI QuickStart Samples repository, you can optimize operational efficiency and performance, and maximize the value of your investment into Databricks Data Intelligence Platform and Power BI. The repository provides step-by-step instructions and Power BI template files to help you implement efficient and high-performing Power BI semantic models on Databricks SQL.

With this repository of reusable samples and best practices we aim to help our customers to implement and optimize their Power BI solutions on Databricks SQL. Both performance and costs wise. Following best practices is the foundation for success in the longer term. No matter if you are a seasoned Power BI practitioner or just started your journey with Power BI on Databricks SQL we highly encourage you to review our Databricks SQL / Power BI QuickStart Samples repository and use them in your day-to-day work.

Resources

Databricks SQL / Power BI QuickStart Samples

Adopting Power BI semantic models on Databricks SQL

--

--

Databricks SQL SME
DBSQL SME Engineering

One stop shop for all technical how-tos, demos, and best practices for building on Databricks SQL