Does Fabric offer anything more than Synapse Workspace?

What’s Microsoft trying to achieve with its latest Data Platform offering?

Carl Follows
Version 1

--

Since it was announced a few months ago, many articles have been written about how amazing Fabric is, but as an Architect I find much of the content marketing gloss or repurposed from underlying services. I need to understand Microsoft’s strategic motivation before I can confidently adopt the technology and adapt my patterns.

I guess I’m just naturally sceptical about these announcements, are they a simple rebranding of existing services, a repackaging for ease of use or do they contain some genuine technical advancements?

As a regular user of Synapse Analytics Workspace for Data Engineering, I feel that most of the services provided in Fabric: Data Factory, Apache Spark Notebooks, and Power BI are already packaged together in a single interface that worked well, so why change it? This is my journey to find out.

OneLake

The most obvious technical change is the introduction of OneLake, which brings standardisation of storage format and centralising governance by having all organisational data in a common repository.

The adoption of the open-source parquet delta table format feels like a real enabler. Separation of storage and compute is already an expected feature of many cloud platforms for cost optimisation purposes. Standardising the storage format increases extensibility, allowing different compute engines to all operate against a single source of truth. Unstructured and semi-structured data can also be stored in OneLake ready for interpretation, there is a separate area for these non-tabular files.

Organisations currently will have multiple storage accounts for ingestion, analysis, and data lake repositories, often these are domain-specific, managed separately, and away from data governance teams. By bringing all of these into a single repository Fabric offers the ability to consolidate security and therefore governance.

Components of Microsoft Fabric

DirectLake

One of the key decisions in any data platform that relies on Power BI for end-user visualisation and analytics is whether to:

  • Import data into the model
    Allowing for excellent performance, but limiting data scale and at risk of latency due to refresh schedule.
  • DirectQuery the data at the source
    Removing the latency and scale limitations, but introducing a risk of poorly performing visualisations due to physical fetching & translation.

Because OneLake uses the standard storage format of parquet delta tables, which is native to Power BI, this opens another approach. The data can be directly pulled from OneLake resulting in a new DirectLake mode, that can have scale, performance, and low latency.

Shortcuts

Organisations can’t be expected to consolidate all data overnight into a new repository, and even once Fabric is fully implemented within an organisation there will continue to be scenarios where data is stored external to OneLake.

For this Fabric provides shortcuts, where these external data stores can be seamlessly presented as part of the OneLake with credentials centrally managed. This functionality currently supports ADLS Gen2 Storage accounts and S3 buckets, but it feels like others will implemented soon.

Dataflow Gen2

When the SQL BI Stack first moved into the cloud many people assumed Data Factory was the evolution of SSIS (SQL Server Integration Services). However, they quickly realised that Data Factory only provided the Control Flow or orchestration element of SSIS and there was no low-code option for transformation. It was several years before Data Factory v2 was released with the low-code Dataflow ability, however by this point, the target audience for low-code data engineering was amazed by the Power Query engine in Power BI for data ingestion. Whilst the Dataflow capability introduced into Data Factory was low-code, the interface didn’t feel as user-friendly or as rich as Power Query, and data engineers were now experienced in writing transformation in Spark Notebooks.

The Dataflow Gen2 in Fabric is this Power Query capability, a real low-code option for data engineering. Power BI users will already be familiar with this style of dataflow, but previously it was part of a separate ecosystem from the Spark Notebooks style of engineering. Bringing the Dataflow Gen2 with the OneLake storage into Fabric alongside the Synapse components will most definitely be transformative for data engineering.

Ease of Use

Synapse Workspace is foremost an engineers tool, provisioned through the Azure Portal and used by specialist teams. Whilst powerful, this paradigm limits its uptake by the larger number of ad-hoc data analysts and power users who want to rapidly prototype without necessarily engaging their IT departments — the Power BI community.

With Fabric, Microsoft wants to engage this wider user base and is leveraging the success of the Power BI service. It’s not a bad analogy to think of Fabric as Synapse Workspace brought into the paradigm of Power BI, indeed the Microsoft 365 admin center role “Power BI administrator” has been rebranded to “Fabric administrator”.

Workspaces

As well as making the interface look and feel more like that of the Power BI service, Fabric reinforces the Power BI concept of workspace: as a governance entity for collaboration on a specific workload. So instead of multiple separate data platforms, we can have a single platform divided into business areas. Fabric introduces a management level of Domain above workspaces to allow business areas to own and manage their multiple workloads.

Like Sites in SharePoint, the Workspace in Fabric is the primary mechanism to control access with colleagues, like in OneDrive, items can also be shared with other people outside of the workspace.

Unified Governance across Microsoft OneLake and Workspaces

Cost of Compute

The cost of running Synapse Workspace is difficult to predict beyond “it depends”… on the volume of data moving around, frequency of refreshes, complexity of transformations, and more. This means cost control is more retrospective, through budgets and alerts with an element of try and then evaluate.

The Power BI compute cost concept of Capacity is taking over from the cost-per-operation approach of Synapse. This capacity is a pool of resources of predefined (therefore known cost) compute power.

This will simplify the art of predicting costs whilst allowing centralised governance of cost control before code deployment. The downside of this cost control may be friction between workload owners who share a capacity, especially when there is a change to the frequency or volume of data refreshes. Whilst the compute can be paused when not in use if it is responsible for both ingestion and visualisation then it will be interesting to see how smooth the utilisation is and therefore how to choose the optimal SKU.

Governance

We are still seeing regular, significant data breaches that are demanding organisations take far more proactive measures for securing their data assets. The first step in securing data is to know what data exists, then we can determine why it’s held and who requires what access. Data Cataloguing tools like Purview help governance teams understand the scope of their organisational data estate to support classification and regulatory compliance.

It appears Fabric will attempt to simplify this further by making Purview a first-class citizen. With data unified in a single OneLake and the transformation engineering being developed in the same platform, I expect an improvement in the both lineage capabilities and reach of Purview, but it still seems early days on this front.

What else

Fabric is a huge suite of capabilities that I’ve only just scratched in this initial assessment. These are some that certainly warrant more time.

Release Cadence

One of the strengths of Power BI is the monthly cadence of updates and Fabric looks to be following this model: August 2023 updates

Copilot

Generative AI is becoming ubiquitous and Fabric seems to be no exception. Having Microsoft’s copilot capability baked into the development experience from inception offers huge potential which I’m keen to explore when I can get the chance.

Synapse Link

Organisations that have implemented Microsoft Dynamics 365 will likely want to take advantage of the Synapse Link to seamlessly make their D365 data available for consumption as a fully managed Lakehouse.

Data Activator

A standard trope in data analytics maturity is the movement from hindsight toward prescriptive analytics. Whilst still in preview, the Data Activator in Fabric looks like it will provide the ability for analysts to prescribe actions directly from insights to help deliver on this aspiration.

Data Activator

So what have I learnt…

After some initial scepticism, I now better understand what Microsoft is trying to achieve with Fabric. I just started looking at it from the wrong angle; it is not an evolution of Synapse Workspace but rather an evolution of the Power BI Service. Less about technical advances, although they are there, more about who can use the technology.

Microsoft is democratising data engineering and data science, bringing the specialist capabilities of Synapse to the citizen engineers of Power BI.

The next question is, are organisations and their employees ready for it?

And Finally…

Is this the end of Synapse? Microsoft says “No”; Fabric is an umbrella wrapping around services that will continue to be available individually.

About the Author:
Carl Follows is a Data Analytics Solution Architect at Version 1.

--

--

Carl Follows
Version 1

Data Analytics Solutions Architect @ Version 1 | Practical Data Modeller | Builds Data Platforms on Azure