Putting Microsoft Fabric to the test

Jacob Rønnow Jensen
3 min readDec 28, 2023

--

Having worked most of my professional career with data warehouses on the Microsoft platform and Fabric for about a year, I agree with Satya Nadella’s statement from this years Build-conference, that Fabric is “the biggest launch of a data product from Microsoft since the launch of SQL Server.”

Through experiments and testing and having had the opportunity to talk to key resources at Microsoft during the private and public preview periods, I am convinced, that Fabric has the potential to be the most comprehensive and cohesive data platform in the market — and with OneLake at the center, and with the capabilities to work with co-pilots and different kinds of compute on the data in OneLake in a secure and governed manner, I truly believe, that Fabric also has the potential to revolutionize the way we work with data.

Obviously, Fabric can support traditional data platform tasks and serve as the foundation for BI, self-service and machine learning via PowerBI and Azure ML/Data Science. However, with shourtcuts and the broad variety of data-wrangling tools and built in security and governance, Fabric also has the potential to accelerate collaboration between the Data Engineering- and Data Consumer Communities and support more elusive data concepts like Data Fabric and Data Mesh.

In AP Pension, we are in the process of building a new data platform, and feel, that with GA “just around the corner” and most of the missing pieces on the roadmap for the next couple of quarters — now seems to be a good time for us to put Fabric to the real test:

Is it possible to build a meta data driven and modern version of Data agnostic ingestion engine in Fabric to handle all our known ingestion types (snapshot, delta, streaming, …), the encryption of PII-data at load time and a best practice framework with templates for data modelling for dealing with slowly changing dimensions, asynchronous updates and bi-temporal timelines in data.

Recognizing that some key components are missing in Fabric, we will do a phased exploration of the tasks above.

Below I have illustrated this in the context of the Medallion Architecture, which has become an increasingly popular framework for visualizing an layered approach to organizing the data warehouse/platform, that one can also find in the works of Kimball, Inmon and Linstedt (and thank you Piethein Strengholt for taking time out of your busy schedule to discuss the Fabric Roadmap and letting me base my illustrations on some of your work — which is truly inspiering).

In phase 1, we will use Databricks for showing how a LakeHouse data load framework could work and make a setup on OneLake. This approach caters to a lean transition to Fabric when stuff like APIs, DevOps integration and One Security are in place.

We will be going though our known ingestions patterns as well as have a use case, that involves bi-temporal modelling in the silver layer.

We will have a code-first approach to the use of Databricks to ensure portability to Data Engineering in Fabric. However, if required functionality is not on the roadmap, we will consider native Databricks solutions as an intermediate alternative.

At the same time, we will be setting up a Fabric prototype with as many of the available capabilities as possible. This gives us the opportunity to run some of the same workloads in Databricks, Data Engineering and Data Warehouse to do some initial cost and performance analysis.

In phase 2, we will be exploring OneSecurity, Real-Time Analytics, Co-pilots, Data Science, Data Activator and the possibillities for automation via APIs in Fabric.

Needless to say, I am excited to embark on this journey of putting Fabric to the test in a real life data platform scenario.

More posts to follow.

--

--