Read Microsoft Fabric Warehouse from Fabric Spark
Microsoft Fabric is an end to end Data and Analytics platform where users have a variety of tools available in an easy to use integrated fashion for all their data and analytics needs. Microsoft Fabric platform provides a few choices for data stores including Warehouse and Lakehouse, the full list of differences between the two is documented in the following guide — Fabric decision guide — choose a data store — Microsoft Fabric | Microsoft Learn. The biggest point to highlight in choosing Lakehouse versus Warehouse is that if one picks Warehouse the writes can only be done using T-SQL. The beauty of Fabric platform is that Fabric Warehouse does not use a proprietary format and stores data in open Delta-Parquet format in Fabric OneLake which can be read using Fabric Spark Compute Engine although write is restricted to T-SQL.
What you will learn?
In this blog post I will demonstrate how data written to Warehouse can be read using Fabric Spark as it might not be very obvious for the teams getting started with this broad platform. In addition I will also share my thoughts around when would it be a good use case for using this capability. The topic itself is not very advanced and shares the basic instructions on how to use Fabric Spark to read a Warehouse Table but basic understanding of Microsoft Fabric is expected.
Use Case
Teams with deep Data Warehousing and SQL expertise might prefer Data Warehouse over Lakehouse store in Microsoft Fabric. Its common to have a variety fo downstream consumers for Data Warehouses:
- Data analysts who primarily use data from data warehouse for reporting purposes
- Data Scientists who prefer using python with Fabric Spark Notebooks for machine learning
- Another set of consumers could be downstream Data Engineering Teams who use Spark to build out their own Data Products over the base tables provided by Data Warehouse
Data from Warehouse can be easily queried using T-SQL in Fabric Portal or client tools like SQL Server Management Studio. If the consumer team belongs to one of the above mentioned categories where preference is Fabric Spark that is feasible as well but not super obvious on how to achieve this objective so the solution section below shares the instructions.
Solution — Read Warehouse using Fabric Spark
Microsoft Fabric Shortcut feature provides the solution. The main purpose of Fabric Shortcuts is to virtualize existing data from data sources in Azure ADLS Gen2 and other cloud data sources like AWS S3 but it also be leveraged to read Fabric Warehouse data from Fabric Spark. In order to read Fabric Warehouse data using Fabric Spark create a Lakehouse and then create a Shortcut in the Lakehouse to the desired Warehouse tables, this method is applicable for both the scenarios where Warehouse and Lakehouse belong to the same workspace as well as different workspaces.
The screenshots below show the steps to create a shortcut from Lakehouse to Warehouse and then read the Warehouse tables using Fabric Spark Notebook (instructions are pretty much the same even if Warehouse and Lakehouse are in different workspaces). Shortcut creation is also documented in the public docs here. The ease of use in Fabric ecosystem is unparalleled, there is no need to look-up any connection string, install special drivers or any complex networking setup. Important to note that these shortcut tables can be joined with other Lakehouse tables for read purpose but cannot be written using the Spark Compute Engine as mentioned above.