Direct Lake in Microsoft Fabric: Revolutionizing Data Storage and Analysis in Power BI🚀:

Gulce Ekin
5 min readJun 1, 2023

--

Microsoft’s announcement of Microsoft Fabric made a significant impact on the market. The article highlights the most important feature of Microsoft Fabric: Direct Lake

Direct Lake mode is a groundbreaking new dataset capability for analyzing very large data volumes in Power BI. Direct Lake is based on loading parquet-formatted files directly from a data lake without having to query a Lakehouse endpoint, and without having to import or duplicate data into a Power BI dataset.

The following diagram shows how classic import and DirectQuery modes compare with the new Direct Lake mode.

In Direct Lake, when a user interacts with a visual and generates DAX queries, those queries are sent directly to the Delta tables. The required columns are loaded into memory and returned to the visual. This eliminates the need for duplicating the data or translating DAX queries to native SQL queries, which improves performance compared to Direct Query mode.

Direct Lake offers several benefits:

  1. Improved query performance: Since the queries are executed directly on the Delta tables, without the need for translation or native SQL queries, the performance is similar to the import mode.
  2. Automatic detection of changes: Any changes made to the Delta tables are automatically detected, and the Direct Lake dataset is refreshed to provide the latest data.
  3. Compatibility with other compute engines: Power BI is just one compute in the Fabric ecosystem. Other compute engines like Spark notebooks, data warehouses, and Kusto can also access the same Delta tables, providing flexibility in choosing the compute for analytics solutions.
  4. Reduced data movement: With Direct Lake, the data does not need to be copied or refreshed as frequently as in the import mode. This saves Power BI capacity resources and reduces the number of refreshes required.

It’s important to note that Direct Lake is not replacing Direct Query or import modes. Depending on the use case, Delta tables can still be used in Direct Query or import mode using the Lakehouse SQL endpoint.

Creating a Direct Lake dataset involves creating Delta tables from the source data. This can be done using different methods such as Dataflow Gen2, notebooks, or pipelines, depending on your experience and skill set. Delta tables are a requirement for Direct Lake datasets.

Prerequisites

Direct Lake is supported on Power BI Premium P and Microsoft Fabric F SKUs only. It’s not supported on Power BI Pro, Premium Per User, or Power BI Embedded A/EM SKUs.

Direct Lake in a Typical Data Warehouse Architecture:🥱

Let’s say you spend 60 minutes incrementally loading your enterprise data warehouse nightly and another 30 minutes refreshing the Power BI dataset.

Here’s what the architecture could look like in Fabric:🤩

Here are the key points: ✨

  1. Direct Lake mode allows Power BI to directly access and utilize Delta Parquet files stored in OneLake, eliminating the need for dataset refreshes and reducing latency in displaying fresh data.
  2. Direct Lake offers improved performance, with columns being loaded on-demand into memory, resulting in fast report rendering similar to the Import model.
  3. Power Query transformations or SQL views need to be materialized as tables for Direct Lake to work effectively. Close coordination between the data integration team and the BI team is encouraged.
  4. Fabric provides flexibility in organizing work, allowing for the separation of duties within large teams. This could involve separate workspaces for data integration and BI, with OneLake shortcuts to tables from the data integration workspace.

Key Considerations and Limitations of Direct Lake in Power BI🔑

1. Direct Lake Datasets and Creation Methods:

· Direct Lake datasets cannot be created in Power BI Desktop or through the XMLA endpoint using Tabular Editor. However, once the dataset is created using the web modeling experience in Power BI service, it can be connected to in Live mode in Power BI Desktop for report creation.

2. Usage of DAX Functions:

· DAX functions can be used in Direct Lake mode without limitations. However, if the query falls back to Direct Query mode, DAX functions not supported in Direct Query will throw an error.

3. Limitations of Direct Lake in Public Preview:

· Some limitations of Direct Lake in its current Public Preview state include the inability to create row-level security and the limitation that RLS automatically switches the table to Direct Query mode. These limitations are expected to be addressed in the future.

4. Building Direct Lake Datasets:

· Direct Lake datasets can be built on existing Delta tables or by creating shortcuts to Delta tables stored in ADLSg2 or S3. The Delta tables do not necessarily need to be in OneLake.

5. Vorder Algorithm and Performance:

· Vorder is an algorithm used to shuffle and sort the data in the parquet files for higher compression and querying speed. Vorder can be automatically applied by Fabric compute engines when creating Delta tables, and the Spark configuration can be specified to enable vorder. It improves Direct Lake performance but is not a strict requirement.

6. Performance Impact with Shortcutted Delta Tables:

· The performance of Direct Lake datasets built on top of shortcutted Delta tables may or may not be impacted, depending on various factors such as geography, network, and the compute engine used to access the data. It’s recommended to perform testing and monitoring to assess the performance impact when using shortcutted Delta tables.

7. Migration Process from Import Mode to Direct Lake:

· There isn’t a direct migration path available within Power BI at the moment to migrate an existing dataset to Direct Lake. To migrate, you would need to manually create the Delta tables from the source data and then rebuild the dataset using the Direct Lake approach. This involves creating the necessary tables, relationships, measures, and visuals in the Direct Lake dataset.

Keep in mind that the migration process can vary depending on the complexity of your existing import mode dataset and the specific requirements of your scenario. It’s recommended to thoroughly plan and test the migration process before making any changes to production datasets.

Additionally, it’s important to stay up-to-date with the latest documentation and announcements from Microsoft regarding Direct Lake and its capabilities. As a feature in Public Preview, there may be updates, improvements, and additional guidance provided by Microsoft over time.

I hope this information helps you understand the process of migrating to Direct Lake in Power BI.

--

--