Adopting Power BI semantic models on Databricks SQL

Andrey Mirskiy
DBSQL SME Engineering
13 min readJun 28, 2024

Authors

Andrey Mirskiy, Senior Specialist Solutions Architect @ Databricks
Mohammad Shahedi, Specialist Solutions Architect @ Databricks
Yatish Anand, Solutions Architect @ Databricks

Introduction

Following Databricks’ introduction of the Lakehouse architecture in 2020, it has been widely adopted by numerous companies. The Lakehouse architecture was developed to combine the benefits of both data warehouses and data lakes. According to a recent survey, 74% of global CIOs currently have a lakehouse in their organization, and nearly all the remaining CIOs plan to implement one within the next three years.

As per The Forrester Wave™: Data Lakehouses, Q2 2024 Databricks is the clear leader in this category.

The Forrester Wave™: Data Lakehouses, Q2 2024

Databricks SQL serves as a crucial component in the lakehouse architecture, offering a best in class serverless data warehouse solution on an open table format. Since its introduction two years ago, Databricks SQL has attracted thousands of customers who have transitioned from traditional or cloud data warehouses, especially in the Business Intelligence category. You can read more about this in Databricks 2023 State of Data + AI Report.

Databricks 2023 State of Data + AI Report

On the other hand, according to The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023, Microsoft Power BI is the leader in the Business Intelligence market. And we have seen an exponential growth of the number of customers modernizing their Power BI estate to use Databricks SQL. It provides world-class performance, scalability to support hundreds or even thousands concurrent users, and unified governance.

This transition from traditional data warehouses to Databricks SQL requires customers to repoint both upstream (data ingestion and transformation) and downstream (BI) applications to Databricks SQL Warehouses.

In this blog, we discuss our field experience helping customers migrate their Power BI estate (semantic models) to Databricks SQL.

What you need to know before you start

Changing a data source for Power BI semantic models is more than just updating a connection string. While the process may seem straightforward, there are several crucial considerations to keep in mind before embarking on this journey. This blog post assumes that you have already migrated tables and views from a legacy data warehouse or platform and have them available in Databrick Data Intelligence Platform while keeping the same schema, object, and column names, as well as data types.

It’s important to note that this post does not cover schema or data migration processes. Additionally, data model redesign or optimizations are beyond the scope of this discussion. However, we highly recommend exploring the wealth of resources available on using Power BI with Databricks, particularly this webinar covering best practices for leveraging Power BI on Databricks.

Common Considerations

Before you start making any modifications to your Power BI semantic models, we recommend reviewing the following considerations while planning for the migration.

  • Backup! This should be a no brainer. Always create a backup in order to be able to recover in case of mistakes due to manual changes. For PBIX files you can simply create backup files. When working with published semantic models you can leverage Power BI Premium capabilities to create backups in Azure storage. You can find more details here.
  • Parameters enable simpler and more flexible switching between environments. If you don’t use parameters yet, it’s a good idea to introduce parameters while changing data sources in semantic models. You can find more guidance and examples here and here.
  • Data type mapping. While Databricks SQL supports an extensive range of data types, there may be instances where certain data types from your previous warehouse or platform are not aligned with those in Databricks SQL or represented differently in Power BI. For example, integer data types in Snowflake are represented as decimal numbers in Power BI. To ensure a smooth transition, it is recommended that you update or refresh data types in Power BI semantic models to align with the date types in tables in Databricks Lakehouse. For a comprehensive list of supported data types, kindly refer to the Databricks Data Intelligence Platform documentation.
  • Object names. Though object names in Databricks Unity Catalog are case insensitive, we recommend preserving object names incl. schema, table, and column names to minimize the risk of broken functionality in Power BI semantic models. In our tests it also turned out that column name casing is important in Power BI. Thus if a column name casing was changed while migrating to Databricks Lakehouse Power BI semantic model may fail because it expects a different column name from a data source. When using Power BI Desktop, it is able to refresh metadata and helps to identify broken formulas and visuals. However, when using Tabular Editor and SSMS to update published semantic models, such problems can be very hard to troubleshoot and fix.
  • Table relationships. While Databricks Unity Catalog supports Primary and Foreign Keys constraints, there is a risk of losing table relationships in Power BI semantic models when refreshing metadata in Power BI Desktop. In case of mismatching Delta-table PK/FKs and table relationships in Power BI semantic model Power BI Desktop may update or delete relationships. This may lead to different results when you refresh the model. To avoid such a problem we recommend preserving relationships by properly setting the following properties in Power BI Desktop. You can find more details here.
Relationships setting in Power BI Desktop
  • Stored credentials. Please note that when changing a data source in a semantic model Power BI needs for respective credentials. This means that the previously saved credentials cannot be used. Therefore, you may need to update credentials. For more information on authentication options in Power BI with Databricks SQL you can refer to this blog post and this webinar. It is also important to ensure your legacy table ACLs are replicated over to Unity Catalog in Databricks.
  • Incremental refresh tables are regular tables. When using Power BI Desktop there are no additional considerations. However, when updating data source for published semantic models using Tabular Editor or SSMS, you need to update both table/partition definition and the refresh policy.
  • User-defined aggregations shall be considered as regular tables. Hence, no specific changes are required.
  • Automatic aggregations shall be re-enabled and re-trained to reflect the changes in a data source and latest data.
  • Tooling. The Power BI ecosystem offers multiple tools for updating Power Query M formulas. While Power BI Desktop is the go-to tool for most data analysts, Power BI engineers often use Tabular Editor or SQL Server Management Studio to work with enterprise semantic models, leveraging advanced features such as partitioning. Regardless of the tool used, the fundamental principles of updating Power BI semantic models remain the same. Where it makes sense we will be using Power BI Desktop to demonstrate the process. Last but not least, we do recommend using the latest version of Power BI Desktop.

Migrating Power BI semantic models in 4 steps

Step 1: Update Power Query M formulas in Power BI Desktop

The first step is to update Power Query M formulas in your Power BI Desktop file. Follow these steps:

  1. Open your Power BI Desktop file containing the semantic model you want to update.
  2. Go to the “Transform data” section in the “Home” ribbon.
  3. In the “Queries” pane, select a query and go to “Advanced Editor” in the “Home” ribbon.
  4. In the “Advanced Editor” dialog, update the connection to the data source and point to your Databricks environment. Further we will discuss this step more in depth.
  5. Repeat steps 3–4 for other queries.
  6. Click “Close&Apply” to save the changes, refresh metadata and reload data.

Power BI Desktop will now fetch the data from your Databricks environment and update the semantic model accordingly.

Step 2: Validate and Publish to Power BI Service

Before publishing your updated semantic model to the Power BI Service, it’s essential to validate that everything is working as expected. Follow these steps:

  1. In Power BI Desktop, review your reports and visualizations to ensure that they are displaying the correct data from Databricks.
  2. If everything looks good, proceed to publish your Power BI Desktop file to the Power BI Service.

Step 3: Update Data Source Credentials in Power BI Service

After publishing your updated semantic model to the Power BI Service, you’ll need to update the data source credentials to ensure that the Power BI Service can connect to your Databricks environment. Follow these steps:

  1. In the Power BI Service, navigate to the workspace containing your published semantic model.
  2. Select the ellipsis (…) next to your semantic model and choose “Settings”.
  3. In the “Settings” pane, click on “Data Source Credentials”.
  4. Update the credentials to match your Databricks environment.
  5. Click “Apply” to save the changes.

Step 4: Refresh the Semantic Model in Power BI Service

Finally, you’ll need to refresh the semantic model in the Power BI Service to ensure that the updated data source connection and credentials are applied. Follow these steps:

  1. In the Power BI Service, navigate to the workspace containing your published semantic model.
  2. Select the ellipsis (…) next to your semantic model and choose “Refresh Now.”

The Power BI Service will now fetch the data from your Databricks environment, and your reports and visualizations should reflect the updated data source.

Updating Power Query M formulas

In the previous section we provided high-level guidance on how to migrate your existing Power BI semantic model and replace existing data sources with Databricks SQL. In this section we will discuss specifics of the Power Query M formulas and typical migration scenarios.

Before changing the data source, let’s take a look at how the Power Query M formula for a connection between Power BI and a table in Databricks SQL looks like.

let
Source = Databricks.Catalogs("server_hostname", "http_path", [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
databricks_catalog = Source{[Name="catalog_name",Kind="Database"]}[Data],
databricks_schema = databricks_catalog{[Name="schema_name",Kind="Schema"]}[Data],
databricks_table = databricks_schema{[Name="table_name",Kind="Table"]}[Data]
in
databricks_table

The let expression follows this structure:

  1. Source = Databricks.Catalogs(“server_host_name”, “http_path”, [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null])
  • In the Source variable the connection to the Databricks SQL endpoint is defined. You can find the server_hostname and the http_path on your warehouse connection details page, by going to: Workspace → SQL Warehouses → Choose SQL Warehouse → Connection details.
SQL Warehouse connection details

2. databricks_catalog = Source{[Name=”catalog_name”,Kind=”Database”]}[Data]

  • Defining the Catalog in Unity Catalog. Kind=”Database” specifies navigation to the catalog.

3. databricks_schema = databricks_catalog{[Name=”schema_name”,Kind=”Schema”]}[Data]

  • Defining the Schema in the Catalog you defined in the previous line of code. Kind=”Schema” specifies navigation to the schema.

4. databricks_table = databricks_schema{[Name=”table_name”,Kind=”Table”]}[Data]

  • Defining the table in the Schema you defined in the previous line of code. Kind=”Table” specifies navigation to the table.

And finally using the in statement specifies the output of the Power Query script.

As mentioned before, we strongly recommend using parameters to easily switch between different environments. Here you can replace the string literals “server_host_name”, “http_path”, “catalog_name”, and “schema_name” with parameters. To learn more about parameters, you can use this blog.

Now that you have a better understanding of Power Query M formulas and how they look with Databricks SQL connections, let’s see how we modify this script in Power BI Desktop.

In order to view and modify the Power Query M formulas in Power BI Desktop, we need to use the Advanced Editor. To open the Advanced Editor, open your report in Power BI Desktop, go to the Home tab in the ribbon, click on Transform Data, and select Advanced Editor.

Advanced Editor in Power Query

Here we are going through some common examples of changing the data source for semantic models. In these examples, Company ABC is heavily using Power BI for reporting and dashboarding. The developer has already created parameters for “server_host_name”, “http_path”, and ”catalog_name_param” which they recovered from the connection details page of their Databricks SQL Warehouse.

1. Hive Metastore → Unity Catalog

In the first example we will cover a scenario in which the report is already connected to a Databricks SQL Warehouse and the data is located in the Hive metastore. Now Company ABC would like to centralize access control, implement auditing, lineage, and data discovery with the help of Unity Catalog. They migrated all their data into Unity Catalog and want to change the data source in their Power BI reports accordingly. It’s important to note that object names and data types remain the same.

Their Power BI developer downloads the published report as a .pbix file or takes a local pbix-file and opens it in Power BI Desktop. Once they open the Advanced Editor, they see this Power Query M formula:

let
Source = Databricks.Catalogs("adb-******.*.azuredatabricks.net", "/sql/1.0/warehouses/*******", [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
databricks_catalog = Source{[Name="hive_metastore",Kind="Database"]}[Data],
databricks_schema = databricks_catalog{[Name="schema_name",Kind="Schema"]}[Data],
databricks_table = databricks_schema{[Name="table_name",Kind="Table"]}[Data]
in
databricks_table

Now that they know how to change the Power Query M formula and they have created the parameters, they modify the query as follows:

let
Source = Databricks.Catalogs(server_hostname_param, http_path_param, [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
databricks_catalog = Source{[Name=catalog_name_param,Kind="Database"]}[Data],
databricks_schema = databricks_catalog{[Name="schema_name",Kind="Schema"]}[Data],
databricks_table = databricks_schema{[Name="table_name",Kind="Table"]}[Data]
in
databricks_table

Please note that server_hostname_param, http_path_param, and catalog_name_param are parameters here, not string literals.

Once they modify the Power Query M formula, they click Done to save changes. If there is no syntax error, they could apply the new script by clicking File → Close & Apply button:

Close & Apply in Power Query

Once the changes are applied and they change the source for all the tables in the report, they can publish the report which now connects to the new catalog.

2. Snowflake → Databricks SQL

In our second example, we will look at the scenario where Power BI reports are connected to a Snowflake cloud data warehouse. Company ABC migrated all their data to Databricks Data Intelligence Platform in order to benefit from open data standards, higher performance, and lower costs. The databases, schemas, and tables have been migrated to Unity Catalog. As users heavily use Power BI for analytics, they want to change the data source in their Power BI reports accordingly. It’s important to note that object names and data types remain the same.

Their Power BI developer downloads the published report as a .pbix file or takes a local pbix-file and opens it in Power BI Desktop. Once they open the Advanced Editor, they see this Power Query M formula:

let
Source = Snowflake.Catalogs("******.snowflakecomputing.com", "warehouse_name"),
SF_Database = Source{[Name="database_name",Kind="Database"]}[Data],
SF_Schema = SF_Database{[Name="schema_name",Kind="Schema"]}[Data],
SF_Table = SF_Schema{[Name="table_name",Kind="Table"]}[Data]
in
SF_Table

Since they have migrated all the data and kept the same schema and table names, they only need to modify the Power Query M formula by replacing Snowflake→Databricks and using their own connection parameters, as shown below.

let
Source = Databricks.Catalogs(server_hostname_param, http_path_param),
SF_Database = Source{[Name=catalog_name_param,Kind="Database"]}[Data],
SF_Schema = SF_Database{[Name="schema_name",Kind="Schema"]}[Data],
SF_Table = SF_Schema{[Name="table_name",Kind="Table"]}[Data]
in
SF_Table

Please note that server_hostname_param, http_path_param, and catalog_name_param are parameters here, not string literals. The “schema_name” and “table_name” in the Power Query M formula remain untouched and they are equal values to the original version. There is no need to change M-query variable names, therefore changes are minimal.

Once these changes are applied, they can publish the report.

Warning: After changing the data source, make sure to refresh the data. This will also refresh metadata, particularly column data types which may differ between Databricks and the legacy data platform.

3. Synapse → Databricks SQL

In our last example, we will examine the scenario where Company ABC has migrated their Synapse cloud data warehouse to Databricks Data Intelligence Platform in order to benefit from better performance, higher concurrency, and unification of ETL, BI, ML, and AI use cases on the same platform. The data is already available in Unity Catalog. As Power BI is the tool of choice of Company ABC’s users, they want to repoint their Power BI reports to the new data source i.e. Databricks SQL. As the migration was like-for-like, the object names and data types were preserved.

Once they open a downloaded or local pbix-file in Power BI Desktop and open the Advanced Editor, the Power Query M formula will appear as follows:

let
Source = Sql.Database("******.sql.azuresynapse.net", "database name"),
synapse_table = Source{[Schema="schema_name",item="table_name"]}[Data]
in
synapse_table

Following the same approach as in the previous examples, the new Power Query M formula for the Databricks connection looks as follows.

let
Source = Databricks.Catalogs(server_hostname_param, http_path_param, [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null]),
databricks_catalog = Source{[Name=catalog_name_param,Kind="Database"]}[Data],
databricks_schema = databricks_catalog{[Name="schema_name",Kind="Schema"]}[Data],
synapse_table = databricks_schema{[Name="table_name",Kind="Table"]}[Data]
in
synapse_table

Please note that server_hostname_param, http_path_param, and catalog_name_param are parameters here, not string literals. The “schema_name” and “table_name” in the Power Query M formula remain untouched and they are equal to the original version.

Once these changes are applied, they can publish the report.

Conclusion

In conclusion, migrating Power BI semantic models to Databricks SQL is a nuanced process that requires careful planning and execution. Here are the key takeaways:

  1. Preparation is crucial: Before starting the migration, ensure you have backups, understand data type mappings, and preserve object names and relationships.
  2. Use parameters: Implement parameters in your Power Query M formulas to simplify switching between environments and enhance flexibility.
  3. Follow a structured approach: The migration process involves five main steps:
    - Updating Power Query M formulas
    - Refreshing the data model
    - Validating and publishing to Power BI Service
    - Updating data source credentials
    - Refreshing the semantic model in Power BI Service
  4. Understand Power Query M formulas: Familiarize yourself with the structure of Databricks SQL connections in Power Query M formulas to make necessary modifications.
  5. Consider common scenarios: We’ve provided examples for migrating from Hive Metastore to Unity Catalog, Snowflake to Databricks SQL, and Synapse to Databricks SQL. These can serve as templates for your specific migration needs.
  6. Be aware of potential issues: Pay attention to data type differences, column name casing, and table relationships during the migration process.
  7. Leverage appropriate tools: While Power BI Desktop is suitable for most tasks, consider using Tabular Editor or SQL Server Management Studio for more advanced scenarios.

By following these guidelines and best practices, you can successfully migrate your Power BI semantic models to Databricks SQL, taking advantage of its performance, scalability, and unified governance features. Remember to thoroughly test your migrated models to ensure data accuracy and report functionality before finalizing the transition.

Resources

Parameterizing your Databricks SQL Connections in Power BI
Power BI on Databricks SQL QuickStart Samples
Power BI on Databricks best practices
Migrate Azure Analysis Services to Power BI Premium using Azure Databricks — part 1, part 2

--

--