Microsoft Fabric: an end-to-end Analytics Solution

Published in

Blue Orange Digital

9 min readMay 9, 2024

Microsoft Fabric is an all-inclusive approach that provides friction-free access to leverage any capability within a single commerce model, driving higher productivity and ROI. Fabric delivers an end-to-end analytics solution with unique advantages in unified user experiences, open and cost-efficient architecture, AI-driven insights, seamless productivity integrations, and capacities that dynamically flex across all workloads.

Fabric tightly integrates data and analytics into the Microsoft 365 apps that business users use every day. Through native connections in apps like Excel, Teams, PowerApps, and SharePoint, users can discover, access, and visually interact with insights from OneLake directly within their productivity workflows. Rather than provisioning separate computing resources across multiple systems, Fabric allows the purchase of a single unified computing capacity pool that powers all workloads. This significantly reduces costs by eliminating idle waste — any available capacity can be dynamically utilized across data integration, warehousing, machine learning, or any other workload.

Fabric provides a unified experience and architecture with all the capabilities needed to extract insights from data across the whole analytics process. It empowers every team — data engineers, data warehousing, data scientists, analysts, and business users — with role-specific experiences in one integrated SaaS product. It includes seven core workloads: Data Factory for data integration, Synapse Data Engineering for data preparation, Synapse Data Science for machine learning, Synapse Data Warehousing for analytics databases, Synapse Real-Time Analytics for streaming data, Power BI for data visualization, and Data Activator for real-time data monitoring (coming soon). Users can get started quickly and realize value rapidly with Fabric’s SaaS delivery.

Fabric centers around OneLake, a built-in multi-cloud data lake that serves as a single unified storage system across all workloads, eliminating isolated data silos. OneLake provides an intuitive data hub with automatic indexing, discovery, sharing, governance, and compliance. An essential capability is its “Shortcuts,” which allow virtualizing data across clouds like Azure, AWS, and Google without moving it, enabling seamless access and analysis across environments. Fabric fully embraces open data formats like Delta and Parquet as the native, standard format across all workloads to minimize data duplication. Fabric will provide a universal, centralized security model managed in OneLake and enforced uniformly across all analytics engines. This unified open architecture reduces complexity, avoids vendor lock-in, and minimizes costs.

Fabric integrates with Azure OpenAI Service to infuse generative AI across all layers, enabling capabilities like conversational data pipelines, model building, code generation, and visualization — all through natural language interaction. The upcoming “Copilot in Fabric” will allow the creation of custom AI assistants combining Azure OpenAI models with an organization’s data. Microsoft does not use customer data to train the base language models, inheriting existing security, compliance, and privacy policies.

Data Ingestion and Integration

Data Factory in Fabric is a central hub for bringing data into the platform. It orchestrates the data ingestion through various tools, such as Dataflows, Pipelines, and Spark Notebooks. Dataflows offer a visual interface for building reusable data transformations, with the Power Query syntax. Pipelines provide more control over the flow of data, and allow for scheduling and complex orchestration of data movement and transformation tasks. Finally, notebooks integrate code-based operations into the pipeline, enabling developers to leverage custom scripts for specific data manipulation needs. This combination of tools within Data Factory empowers users to ingest data from various sources, clean and transform it, and prepare it for further analysis within the Fabric ecosystem.

In this sense, Dataflow is often the tool of choice for small datasets and simple transformations. On the other end of the spectrum, Spark Notebooks can be used to ingest batch or streams of large data loads, even with parallel processing capabilities. As a middle and flexible option, there are Pipelines which can be used to orchestrate Dataflows and Notebooks.

Pipelines offer out-of-the-box multiple connectors to ingest data from diverse sources of data:

As expected, multiple cloud sources, such as databases, blob storage, and streaming services, are from Microsoft Azure and other clouds such as AWS, GCP, and Snowflake. An additional possibility is to integrate with Microsoft collaboration services, Office 365. With that option, it is possible to ingest data from OneDrive, Sharepoint, and PowerApp tables. This last option could be attractive for companies without well-structured operational data. Having transactional data structured as operational tables on PowerApp can be helpful before consolidating it with other data on analytical tables on Microsoft Fabric.

On the PowerApp side, data tables can be viewed in simple tabular format, like the Clouds table shown below. In order for this and any other operational tables living on the collaboration tool domain to be available on the cloud data integrated environment, the Datalake service must be enabled with Azure Synapse Link. It will create a “shortcut” to this data directly into the cloud environment.

From there, it can be consumed straightforwardly by a reporting tool like Power BI. Or, if it needs to be transformed, the transformation mechanisms can be applied.

A practical example illustrated here follows the architecture described:

As part of the Dataverse domain, the data ingestion starts from the operational tables on the PowerApp environment and are replicated in the Azure environment through the “shortcut” feature from the Synapse Link, and it is exposed on the Lakehouse under the Onelake domain.
From the Lakehouse, we consolidate and transform the data using either Pipelines or Dataflow.
The processed and transformed data is made available on a Analytical Lakehouse. This Lakehouse can be configured at different service levels, bronze, silver and gold, but for simplification the example here would be limited to a single space.
The analytical Lakehouse is the perfect data source for analytical reporting or dashboarding, using different flavors of BI tools, including PowerBI which is depicted in the diagram.

Data Transformation

As mentioned previously, Fabric can ingest and process data from many more sources in addition to the PowerApp shown in our example. For all of these, most data consolidation and transformation can be performed with two options Dataflow or Pipelines. Moving forward with the example described above, we will have a simple table of cloud vendors' products, with a release date column. As a simple transformation example, we will calculate how long ago a service was released using Dataflow Gen2.

Dataflow Gen2

Similar to Power Query, Dataflow Gen2 offers a visual interface for building data pipelines. One of its key advantages is that it’s entirely cloud-based, eliminating the need for software downloads and offering scalability for larger datasets.

In our example, from the multiple possible sources that Dataflow supports, we connect with the Lakehouse, more specifically with the table “shortcut” coming from the PowerApp domain. With that data there are several transformations to be performed. In our case, we want to add a custom column with the following formula.

Table.TransformColumnTypes(
Table.AddColumn(#”Navigation 3", “daysreleased”, each 
Duration.Days(DateTime.LocalNow() — [createdon] ) ), 
{{“daysreleased”, Int64.Type}})

Once the new column is calculated, it possible to set the destination table of the data, which will be an analytical one in the same Lakehouse.

Fabric Pipelines

Pipelines work as orchestrators and can trigger different transformation mechanisms, such as Dataflow. Here we will highlight two methods for ingesting data into Fabric, native Pipeline Activities and Spark Notebooks.

Pipeline Activities

Data Factory pipeline activities are the building blocks of data workflows. Each activity represents a specific action performed on your data, such as copying data from one storage location to another, transforming data through cleansing or formatting, or triggering another pipeline entirely. The available activities can be seen as follow:

On the example presented here, it is possible to use something as a simple SQL Script.

Spark Notebooks

Since notebooks are a powerful option for large parallel data processing, we will also illustrate accomplishing this same outcome with a simple dataset and notebooks. Using the operational tables shortcut on the Dataverse Lakehouse, this can be consolidated and transformed into analytical tables, as shown below:

CREATE or REPLACE table lk.ana_clouds AS 
(SELECT c.*, datediff(current_date(), c.createdon) as daysreleased 
FROM dataverse_cds2.cr7ed_cloudvendors c)

Having the notebook available, it is possible to trigger it from a Pipeline:

Data Warehouse vs Lakehouse

As the Lakehouse architecture provides support to working with either unstructured, semi-structured and even structured data and a flexible way to organize the processing flow of this data in different stages and layers, Lakehouses are ideal for working with ELT processes.

For environments driven solely by structured data and employing not only ETL processes but also ELT, a traditional Data Warehouse service can be an ideal choice. It offers robust solutions tailored specifically for structured data scenarios. Warehouses can serve as central repositories for organizing, storing, and analyzing data from various sources. Data warehouse developers and SQL engineers primarily work with it. Proficiency in SQL is essential for managing and querying data within the warehouse. With features like efficient indexing, time series analysis, and geospatial capabilities, data warehouses provide solid foundations for business intelligence and reporting. If your focus is on structured data and streamlined querying, data warehouses are a reliable choice.

Both Data warehouses and Lakehouses provide default semantic layers. The semantic layer serves as a critical bridge between raw data and meaningful insights. They act as an abstraction layer, enabling users to interact with data in a more intuitive and business-friendly manner. By defining relationships, hierarchies, and calculations, the semantic layer simplifies complex data models. Semantic layers provide a unified view across various data sources, making it easier for analysts, data scientists, and business users to explore, query, and visualize data. Essentially, they transform technical data structures into a language that aligns with users’ needs, fostering better decision-making and analysis within the organization.

BI Reporting and Dashboarding

While Azure supports a variety of business intelligence tools, Power BI is the natural reporting tool for the Azure and Microsoft suite world. It is a powerful business intelligence and data visualization tool; Power BI empowers users to transform raw data into actionable insights. Its intuitive interface allows for easy data exploration, interactive dashboards, and dynamic reports. By connecting to various data sources, including dataflows and data warehouses, Power BI enables organizations to make informed decisions based on real-time analytics. Whether you’re a business analyst, data scientist, or executive, Power BI provides a unified platform for data storytelling, collaboration, and data-driven decision-making.

Conclusion

Microsoft Fabric is a unified end-to-end analytics solution that provides a comprehensive set of tools and capabilities for data ingestion, integration, transformation, and analysis. It offers a single unified experience and architecture across a broad range of workloads, making it easy for organizations of all sizes to get started quickly and realize value rapidly. With its built-in multi-cloud data lake, support for open data formats, and integration with Azure OpenAI Service, Fabric is a powerful and versatile platform that can help organizations of all sizes gain a competitive advantage.