Migrating from Azure Synapse to Fabric

Robert Caspari
7 min readJul 12, 2023

--

Migrate (not just) from Azure Synapse to Fabric Synapse: Key Considerations.

Introducing Microsoft Fabric — Picture from Microsoft

Your move and migration from Azure Synapse Analytics to Microsoft Fabric should be evaluated carefully to maintain business continuity while delivering an all-in data solution.

Introduction

Microsoft recently unveiled its new flagship all-in data solution that aims to enable more users to utilize its data services without needing to manage any underlying infrastructure in Azure directly. Three primary questions come to mind: Are there any drawbacks? Should I migrate my existing data warehouse? And if yes, what is the optimal timeframe for a move?

To answer any of the questions above, some considerations need to be made when contemplating such changes. This article aims to answer these as best and as generally as possible.

Contents

  • What is MS Fabric?
  • What are the differences between existing services?
  • Are there any drawbacks?
  • Should I migrate my existing data warehouse?
  • Conclusion

What is MS Fabric?

I’m fairly certain that you have heard of Fabric in recent weeks. In summary, Fabric now unifies Data Factory, Synapse and Power BI in one Interface where all development is saved in workspaces. The second major aspect of Fabric is, that it bets big on the Lakehouse Platform — storing all data in the Delta Format. Data and reports are stored in the “OneLake” — basically OneDrive for Data.

Microsoft Fabric — Source

Together this means that from a technology perspective, Data Engineers and Data Analysts move closer together.

What are the differences between existing services?

Depending on what your current Azure Synapse Analytics implementation comprises, differences can be plentiful or few. Broadly speaking using MS Fabric might require you to rethink architecture, skillset and reporting landscape.

While this article primarily tries to explain the difference between Azure Synapse Analytics and not Azure Synapse Data Warehouse or other more architecturally traditional approaches (data warehouse vs lakehouse), some of the following considerations apply to all analytical systems.

Architecture

Here I assume that you are more or less familiar with a lakehouse architecture. Since MS Fabric utilizes a lakehouse architecture, you don’t have to decide which architectural style you have to implement. However, there can be variations to lakehouses. The most common data processing resides on the medallion approach, where data is processed from raw, called the bronze layer, over cleaned, called silver, to aggregated and ready for analysis, called the gold layer. As an MS Engineer summarizes it in this Reddit post [3]: “If you designed your architecture as “Lake First” the transition will be virtually automatic”. But he also states that the same cannot be said for using more traditional approaches. So from this, we can assume, that migrating Pipelines and the like from Azure Synapse should be straightforward with migration paths already announced, but not published yet.

When it comes to new developments within Fabric I am still skeptical of a sound concept for CI/CD when it comes to both data and logic components, such as data source definitions and ETL processes and steps. With Azure Synapse you have to invest some thought into designing a CI/CD Architecture, that fulfils your individual needs. But it is absolutely possible to have more than one environment with a shared repository but isolated physical data for ‘dev’ and ‘prod’. My concerns can also be explained by comparing Synapse Spark Notebook Git commits with Databricks Notebook commits, where in Synapse you can’t save changes locally and commit them later and when committing your changes you cannot add comments. From my experience, Azure Synapses Git implementation is more of a way to save your work, than to version it for lineage. While describing your commits certainly isn’t the main benefit of using Git, it illustrates Synapse’s relationship to Git fairly well in my opinion. One highlight however in the current preview of Fabric is the ability to version PowerBI Reports, even if version source data isn’t very readable inside a repository and outside of PowerBI. Still being able to version reports is a big step forward, which is much appreciated.

Governance and Compliance

Currently, only workspace-level access is implemented. For PowerBI this extends to individual Reports, when using DAX within a report it also extends to Rows (RLS) and Columns (CLS) level security. However, neither RLS nor CLS are implemented yet for the query engine itself (!). This means that a user might gain access to restricted data using a Spark Notebook. But is said, that OneSecurity addresses this need within fabric across all query engines in the future.

Skillset

Generally MS Fabric together with OneLake bets on the open source Delta format, which adds to the columnar .parquet file format. To interact most efficiently and flexibly with this format one ought to use Apache Spark. Apache Spark offers interfaces with SQL, Python, Scala and R. In my personal experience, the SQL functions can be great, but their full benefit is best achieved with a flexible and easy-to-use programming language such as Python. While using Python with SQL offers options to easily script and template SQL statements to i.e. quickly implement a metadata-driven ETL approach (see my other article here). Those skills might not necessarily be in your data warehouse team's existing skillset, which can make migration more tedious since it might require additional training. But if your team is used to Azure Synapse Analytics using Spark or Dataflows migrating should be straightforward.

Reports

From a technological perspective, reporting doesn’t change at all — as long as PowerBI is your primary tool. Integrating other Reporting tools with Fabric might be different from just using a SQL endpoint from Synapse, which will require additional adjustments. If however you are using PowerBI or are thinking of migrating to it then this one is a clear advantage with some individual tests already showing increased performance when querying a lakehouse-style architecture. Another big plus is that PowerBI offers more features than ever before, such as Large Language Models (LLMs — think ChatGPT) integrated into PowerBI to more naturally query data or help out with those complex DAX Calculations.

Are there any drawbacks?

While so far Fabric with its announced features and improvements sounds pretty good, there are drawbacks. Currently, you would need to rewrite or manually import your existing logic such as linked services and pipelines. This would cause quite a migration effort. In the future MS plans to automate this by offering mounting capabilities that can mount your existing Synapse landscape and move your development artifacts (Pipelines etc.) automatically to Fabric.

Also as with any new product (or evolution of one) bugs, missing features and the like are to be expected. But with the public preview and the early phases of general availability (GA) together with customer feedback, this is going to be less and less of an issue. This brings the drawbacks back to the current status of Fabric. Some of the features are just missing as of now as pointed out by Nikola [1]. Those missing features include Mapping Data Flows, OPENROWSET(), Synapse Link and from my experience a few Connectors.

Connectors like Salesforce are currently not yet implemented — own picture taken from Fabric.

Another drawback is a bit more ambiguous. Using Fabric instead of Azure Synapse means that, while you no longer have to, you also cannot manage your underlying infrastructure. If that’s something that you desire to further customize individual solutions, Fabric might not be for you.

To summarize the biggest drawback in my opinion right now is, that Fabric isn’t finished and doesn’t offer the full feature set that is expected of an all-in data solution yet.

Should I migrate my existing data warehouse?

In short: If you have logic in Synapse and use PowerBI, it’s a clear yes. Especially the performance gains are promising (e.g. newly developed V-Ordering). But you might want to hold on for a bit and wait until it becomes a bit more polished. If you don’t have logic in Synapse and/or are not planning to use PowerBI as your primary tool your evaluation becomes a bit more complex and you should asses other competitors, such as Databricks or open-source data stacks.

In any case, don’t get swept away by the hype and consider the individual business case carefully. After considering current and future business needs think about the technical needs to fulfill requirements. And evaluating those technical needs isn’t easy right now, since a Roadmap for Fabric still isn’t published. Therefore their question might not be whether to migrate, but when to migrate. But I fully expect there to be more information coming on migrations paths to move completely to Fabric as it gets closer to GA. But some examples of implementations by customers can already be found here [2].

Conclussion

In my opinion it comes down to when to migrate. Answering this important question depends mostly on the complexity of your data project, but also a bit on the Microsofts plans with Fabric. First of all, it’s important to recognize that there is no fixed date for the planned general availability of Fabric yet. This should be released together with a roadmap for Fabric. This roadmap will be found under aka.ms/FabricRoadmap later on (currently leads to the Fabric Blog). However, I recommend using the current public preview to evaluate the specific effort and make time estimates for tasks. This is something which cannot be generalized since all requirements and existing structures such as pipelines and downstream analytics are individual and can also be complex. I hate to say it, but the answer to “What is the optimal timeframe for a migration?” is it depends.

Reference

[1] Killing me softly — Has Microsoft Fabric just “overwritten” Synapse Analytics? | by Nikola Ilic | Jun, 2023 | Medium

[2] Data Lake Implementation using Microsoft Fabric | by Samarendra Panda | Jun, 2023 | Medium

[3] Ask me Anything (AMA) about Microsoft Fabric! : r/MicrosoftFabric (reddit.com)

--

--