Microsoft Fabric — Data Sharing between Data Engineering, Data Analyst and Data Science Teams

Published in

Microsoft Azure

11 min readApr 28, 2024

--

Microsoft Fabric is an end to end analytics platform with capabilities for Data Engineers, Data Analysts as well as Data Scientists. A single Fabric Workspace can be used for self-service analytics for all those capabilities seamlessly but more often than not organizational structures are setup where there are separate teams for each of those personas and need is for them to operate separately in their own Workspaces. In this blog post I will go into design where Data Engineering Team is responsible for collecting data from transactional stores and making it available to downstream Data Analyst and Data Science Teams (there could be multiple) for consumption. In addition to going over high level design I will also delve into details on how to setup access control for different teams based on the commonly observed requirements.

This is not an introductory article I am assuming basic knowledge of Microsoft Fabric concepts like Lakehouse, Warehouse, Shortcuts, Lakehouse SQL Endpoint, etc. There are many ways to architect solutions and Fabric being relatively new, patterns are being discovered so my hope is that even if you are not able to adopt the design as it is the knowledge you gain here will help you extend the idea to fit your needs.

Workspace Design

The following diagram shows Fabric Workspace design where each team has its own Workspace. Data Engineering team ingests data from upstream transactional stores and curates/transforms the data to make it available for downstream consumption in Data Analyst and Data Science Workspaces using Shortcut feature which does not require any duplication of data.

Workspaces are powered by compute referred to as F-SKU (or P-SKU which is a legacy billing model). A single Compute instance can power multiple workspaces but if desired each workspace can be assigned a different compute instance. The diagram above shows the configuration where separate compute is assigned to each workspace but that is not necessary.

Although compute instances are nothing more than billing/accounting mechanisms in Fabric as one does not provision or start/stop virtual machines, still there is benefit of assigning different compute to each workspace so that environment used for Querying/reporting activities by one consumer team is completely isolated from other team (could be another consumer team or data engineering team). Another good reason to assign separate compute is that each team has clear bill for their compute environment for chargeback scenarios in enterprises.

In traditional data warehouses including Synapse SQL Dedicated Pool complex Workload Management techniques needed to be implemented to allocate compute quotas for various teams loading or consuming data. In Microsoft Fabric data is stored in open format and not tied to a particular compute hence it’s pretty easy to use different computes with same data (also referred to as separation of compute and storage).

Data Engineering Workspace

Data Engineering team will ingest data from upstream transactional stores, transform and build tables for downstream consumption in this Workspace, Spark is usually preferred tool for Data Engineering so Lakehouse would be a good choice for the tables in this Workspace.

Data Engineering Team members are granted Contributor Role on this Workspace (in our example DataEngineer Microsoft Entra Group)
Data Analyst and Data Science Team members will not need any Workspace Level Roles on Data Engineering Workspace.

Screenshots below shows Data Engineering Workspace containing the Lakehouse with tables built by Data Engineering team. A user with Contributor Level role on Data Engineering Workspace as well as downstream consumption workspaces is needed for the subsequent steps of sharing data between workspaces so as an example DataSharingSetupUsers Microsoft Entra Group is shown to have Contributor Role on the Data Engineering Workspace.

**Lakehouse in Data Engineering Workspace**

Data Analyst Workspace

Common requirements for Data Analyst Team:

Need read access to tables built by Data Engineering Team
SQL is preferred language for querying data
Need to build reports using Power BI
Some more technical/power users might want to create their own tables based on queries on Data Engineering Team provided tables
Some more technical/power users might want to load their own datasets into a Warehouse in the Workspace and join with Data Engineering Team provided tables

Setup Details

Data Analyst Team members will be Contributors on Data Analyst Workspace only.
Another user with Contributor or higher role on both Data Analyst and Data Engineering Workspaces needs to create a Lakehouse in the Data Analyst Workspace as well as create shortcut in this Lakehouse to specific Data Engineering Workspace Lakehouse Tables.

In our example, Henry Kramer is setup as Contributor through DataSharingSetupUser Entra Group membership and will execute the above actions.

Each Lakehouse automatically comes with Lakehouse SQL Endpoint and hence Data Analyst Team members can query data from shortcut enabled tables for which data resides in Data Engineering Workspace Lakehouse. Important thing to keep in mind is that Lakehouse SQL Endpoint uses delegated access model so Data Analyst Lakehouse Owner (user who created the Lakehouse, in our example Henry Kramer) is the identity used to read data from Data Engineering Workspace and Data Analyst Team members don’t need any permissions on Data Engineering Workspace.

Screenshot below shows the Data Analyst Workspace, DataSharingSetupUsers is Contributor and Henry Kramer created DataAnalystLakehouse hence is the Owner for the Lakehouse (this is necesssary for delegated access model described above).

The next screenshot shows all Data Engineering Workspace Lakehouse tables selected except cms_provider_drug_costs for shortcut creation to be made available in the Data Analyst Workspace Lakehouse for consumption.

Data Analyst Workspace — Shortcuts to Data Engineering Workspace Lakehoes Tables — **Data Analyst Workspace — Shortcuts to Data Engineering Workspace Lakehouse Tables**

The next screenshot shows that a user who is Contributor on Data Analyst Workspace through DataAnalyst Entra Group membership can query Lakehouse SQL Endpoint tables which are shortcuts to Data Engineering Workspace Lakehouse tables.

**Data Analyst Workspace — Query using Lakehouse SQL Endpoint**

Power User Requirement

Now, let’s touch up on additional requirement for Power Users who are heavy SQL users and might want to go beyond querying/report building on shortcut enabled tables from Data Engineering Workspaces. Common requirement for this category of users:

Ability to create new tables based on data in Lakehouse SQL Endpoint (which is Data Engineering Workspace tables enabled using shortcuts)
Need to load their own datasets to join with data from Lakehouse SQL Endpoint

Lakehouse SQL Endpoint can be used to read data from tables but users cannot create tables (write data) so the solution here is to create Warehouse in Data Analyst Workspace. Data Analyst Team will have full permissions to create tables and write data in this Warehouse. Any user with Contributor or higher role on Data Analyst Workspace can create a Warehouse and screenshot below show a Warehouse to be used by such Power Users.

Fabric allows cross-database queries between Lakehouse SQL Endpoint and Warehouse so its possible to run query on Lakehouse SQL Endpoint and write the results to Warehouse. For example, a query to aggregate results from Lakehouse SQL Endpoint and write out final dataset to warehouse. Once the Warehouse is open on Fabric Portal use +Warehouses button on top left to add Lakehouse SQL Endpoint. The screenshot below shows the query which reads data from Lakehouse SQL Endpoint and writes the results to a table in Warehouse.

**Query Lakehouse SQL Endpoint and write results to Warehouse**

In a very similar manner Data Analyst Team members can also load their own datasets into Warehouse and join those tables with tables from Lakehouse SQL Endpoint to create new Data Products.

Data Mesh Aspects — Data Product & Self-Service Analytics

The above described capabilities are aspects of Data Mesh architecture where one team published a Data Product and another consumed that Data Product to build their own Data Product. The Data Engineering Team built and shared Data Product A and then Data Analysts Team have its own workspace where they consumed Data Product A to build Data Product B.

Data Analyst Team — No Lakehouse access using Spark on Shortcut Tables

For completeness, its important to call out that Data Analyst Team members have access to read data using Lakehouse SQL Endpoint but they cannot see data from Lakehouse or Spark engine with Lakehouse. This is because Lakehouse does not use delegated access control model, if there is a need for Data Analyst Team to have access to Lakehouse or use Spark please follow the instructions from Data Science Workspace section.

**No Access to Shortcut tables in Lakehouse from Spark**

Data Science Workspace

Common Requirements for Data Science Team:

Need read access to tables built by Data Engineering Team
Python and Spark for data exploration and building out ML Models

Setup Details

Data Science Team will be Contributor on Data Science Workspace
A user with Member or higher permissions on Data Engineering Workspace needs to use OneLake RBAC Access Model to grant permission on specific Data Engineering Lakehouse Tables to Data Science Team (in this example DataScientists Entra Group).
Data Science Team who are setup as Contributors on Data Science Workspace can create a Lakehouse and then create shortcuts in the Lakehouse to the Data Engineering Workspace Lakehouse Tables on which they were granted permissions using OneLake RBAC Access Model in the previous step.

The subsequent sections go into details on OneLake RBAC Access as well as creation of Data Science Workspace Lakehouse/Warehouse.

Grant Data Engineering Lakehouse Table Access to Data Science Team using OneLake RABC Access Model

The real need for Data Scientists is to access the Shortcut Tables using Spark which does not use delegated access model and passes the calling user’s identity to Data Engineering Workspace so for this OneLake RBAC method needs to be used — Data Access Control Model in OneLake (Public Preview) — Microsoft Fabric | Microsoft Learn

Step 1 and Step 4 — Need to be executed by a user with Member or higher role on Data Engineering Workspace.
Step 2 and Step 3 — Need to be executed by a user with Contributor or higher role on Data Engineering Workspace

You can read more on Step 2 to 4 in the public documentation — Get started with OneLake data access roles (preview) — Microsoft Fabric | Microsoft Learn

Step 1: Share the Data Engineering Lakehouse (which is target of the shortcut tables) with DataScience Team (DataScientist Entra Group) without selecting any checkbox on Share screen, this is referred to as Item Level Sharing, Data Science Team do not need any Workspace Level Roles on Data Engineering Workspace

**Data Engineering Workspace — Share Lakehouse with Data Science Team**

Step 2: Open the Lakehouse in Data Engineering Workspace and Click Manage OneLake Data Access button on the toolbar to open OneLake Data Access Role management panel on the right side.

**Data Engineering Workspace — Lakehouse — Manage One Lake Data Access**

Step 3: Create new OneLake Data Access Role with a suitable name like DataScientists, choose Selected Folders option and check only the tables for which shortcut was created and need to be made available to Data Science Team. Lastly click save and close the right panel

**OneLake Data Access — New Role for Data Scientists**

Step 4: Click Manage OneLake Data Access button on the toolbar again to open OneLake Data Access panel on the right side. This time click the DataScientist role created in the previous step and assign Data Science Team (in our example DataScientist Entra Group) to the role using Add People or Groups option.

Manage OneLake Data Access — Select DataScientist Role — **Manage OneLake Data Access — Select DataScientists Role**

**Manage OneLake Data Access — Assign Role**

Manage OneLake Data Access — Add DataScientist Entra Group to DataScientist Role — **Manage OneLake Data Access — Add DataScientists Entra Group to DataScientists Role**

Once the role assignment is saved, Data Science Team members will be able to access shortcut tables in DataScientist workspace Lakehouse using Spark engine.

**Data Science Workspace — Spark to read shortcut enabled Data Engineering Workspace Lakehouse Tables**

Create Lakehouse and Shortcut in Data Science Workspace to Data Engineering Workspace Tables

Any user with Contributor or higher permissions on Data Science Workspace can create a Lakehouse in the Data Science Workspace and then create shortcuts in that Lakehouse to the Data Engineering Workspace Tables.

Screenshot below shows Data Science Workspace with a Lakehouse and Data Science Team is setup as Contributor on the Workspace.

The next screenshot shows the shortcut creation screen, note that only specific tables for which Data Science team was granted permissions using OneLake RBAC Access Model are listed.

**Shortcut Creation — Data Science Workspace Lakehouse to Data Engineering Workspace Lakehouse Tables**

It is important to note that Lakehouse does not use delegated access model and passes the calling user’s identity hence OneLake RBAC Access Model is used. With above setup Data Scientists will be able to use Spark to interact with Lakehouse which is the primary need but not the Lakehouse SQL Endpoint. As explained in the Data Analyst Worskpace section Lakehouse SQL Endpoint uses delegated access model where Lakehouse SQL Endpoint Owner identity is used to access data from Data Engineering Workspace, if Lakehouse SQL Endpoint access is needed in the Data Science Workspace then in that case use the instructions from Data Analyst Workspace section basically the user creating Data Science Workspace Lakehouse needs to be created by a user with Contributor role on both Data Science and Data Engineering Workspaces.

Reference Links

Disclaimer

Microsoft Fabric Platform is evolving fast and I will try my best to keep the article up to date but this is the solution as of April 2024. Lastly, I did not touch up Row Level and Column Level security for shared data which can only be achieved on the Warehouse and Lakehouse SQL Endpoint but not Lakehouse as of now.