Fabric Swatches — The Silky Lakehouse

Achraf
5 min readAug 20, 2024

--

Image generated by DALL-E

Prelude

The following article is the first one in a series titled : Fabric Swatches (Of course I had to get a pun in the name). This series’ purpose is to dive into each item, experience, component of Microsoft Fabric and get that deep dive in small manageable chunks of information to jump start -or refresh- your knowledge of Fabric.

OneLake

What happens when a Data Lake and One Drive have a baby?

Storage in Microsoft Fabric is handled by a unified and tenant-wide layer called OneLake.
OneLake is the logical storage layer containing the data for all storage items such as lakehouse, warehouse, eventhouse, semantic models and newer services like mirrored databases.

What is a Lakehouse?

Imagine having the best of both worlds — a place where the vast storage capacity of a data lake meets the structure and performance of a data warehouse. That’s the Lakehouse in Microsoft Fabric.
It’s a unified storage layer designed to store and manage all your data — structured or unstructured — making it ready for deep analytics and machine learning.
Data is stored in files and tables (Delta Tables) offering all the benefits of the delta table format. (cf. Delta Lake)

Building your Lakehouse

Getting started with a Lakehouse in Fabric is a breeze.

From the Data Engineering experience, creating a new Lakehouse is just a couple of clicks away.

Lakehouse creation walkthrough

Once set up, you have complete control over who can access and manage your data.

Organizing your data inside Fabric makes it easy to keep your Lakehouse tidy and efficient. You can structure your data into folders and subfolders, ensuring your data is always easy to find and manage.

Lakehouse security

Security for the lakehouse starts with basic permission assignement either inherited from the workspace permissions or via sharing a Lakehouse.

Granting access via lakehouse sharing

Another feature allowing more fine-grained control is Fabric’s OneLake Data Access, which allows for a role-based access control (RBAC) implementation, ensuring that you can securely share data with the right people while keeping everything organized.
This is done via the Manage OneLake Data Access button, and creating needed roles and assigning said roles to users (cf. Read more here)

Role Assignement

OneLake Data Access is not yet available for lakehouses with lakehouse schemas enabled

Lakehouse schema

As of writing this article, the feature : lakehouse schema is in preview.

Lakehouse schema is a feature that brings the schema structure of a warehouse to the lakehouse, allowing the definition of custom schemas to group tables together.

Some benefits of this feature :

  • Better lakehouse management
  • Improved security implementation when coupled with Data access roles and SQL OLS (more on this in the warehouse article)
  • The ability to link one or multiples tables using schema shortcuts, these tables can be in another lakehouse, or even warehouse.

Certain limitations do exist as of writing considering the new state of the feature, read more about it here.

Shortcuts

Why duplicate when you can link?
Fabric’s Shortcut feature lets you connect to data stored elsewhere — whether within another Fabric workspace or an external source — without the need for duplication. This means you can avoid duplicating your data and incurring additional costs while maintaining access to all the data you need.

Available shortcut sources as of writing

Shortcut Security

Security for One Lake shortcuts works differently depending on the type of shortcut used, but a good baseline rule is :

  • The first level of security is access to the lakehouse itself
  • The second level is One Lake’s Data Access management mentioned above
  • The third level is the native permission schema of the source in question — which depends on the authentication method used — , basically you see and can do what the authenticated principal can.

More detail here.

Files and Tables

A MS Fabric Lakehouse is built on files and tables, giving you the flexibility to store and manage data in the format that works best for you, but mostly your data can come as is, and then be “pampered” inside Fabric.
Microsoft Fabric supports a wide range of file types such as Parquet, CSV, JSON, Avro and more, allowing you to work with whatever data you have, and organize it in a directory like structure of your choosing.

Creating tables is easy too, files can be loaded into tables using the UI, notebooks, Data Factory or the Rest API.

As mentioned above, tables in the lakehouse are Delta Tables, and can be either managed or external — “unmanaged” — .
Whether you need managed tables — fully controlled by Fabric for easy maintenance — or external tables for which you manage the data storage location, Fabric has you covered.

When it comes to performance, features like Delta Lake’s V-Order optimization help keep your queries fast and efficient by optimizing how data is stored, making querying and working with your data in the lakehouse a more viable option than traditional approaches. (More detail on this in the Compute deep-dive).

SQL Analytics Endpoint

The SQL Analytics Endpoint is where your Lakehouse truly shines. This feature allows you to query your data using T-SQL, directly within the Lakehouse. It’s perfect for those who want to integrate their data with BI tools like Power BI, enabling you to run complex queries and generate insightful reports, all without moving your data.

With the SQL Analytics Endpoint, you get the best of both worlds: the familiar power of SQL and the scalability of a modern data platform all while avoiding data corruption since this layer is read-only. It’s the perfect tool for unlocking insights from your data, whether you’re dealing with small datasets or massive data volumes.

Connecting to the endpoint is also as easy as connecting to a good old SQL Server — well, almost, you have to use a service principal or a EntraID user account, SQL role logins are not yet supported — .

Keep an eye on the GitHub repo below for connectivity examples.

Automation

Most actions mentioned in this article are possible via the Rest API.
I’ll be updating regularly the following GitHub repo with code samples for the MS Fabric Rest API:
aci-labs/AzureLabs (github.com)

Stay tuned for the next article in the serie, where I take a stroll around the Warehouse.

--

--

Achraf

Data Architect, I talk about modern data platforms and architectures, data governance, modern data practices