Breaking the News Now: Microsoft Fabric — the one-stop analytics solution that businesses have been waiting for

Jinnatul Raihan Mumu
14 min readMay 29, 2023

--

As a business intelligence enthusiast, I understand the immense value that data holds for organizations. The ability to extract meaningful insights from data is crucial for driving innovation, optimizing operations, and making informed decisions. That’s why the news of Microsoft Fabric is sending shockwaves through the analytics community.

So What is Microsoft Fabric?

Microsoft Fabric is not just another analytics solution; it’s a comprehensive platform that promises to change the future of data analytics. With its powerful suite of services, Microsoft Fabric covers every aspect of the analytics process, from data movement to data science, real-time analytics, and business intelligence. It’s truly the one-stop solution that businesses have been waiting for.

At the heart of Microsoft Fabric lies OneLake, a revolutionary concept that seamlessly integrates workloads and data. Just as Microsoft 365 applications are harmoniously wired with OneDrive, OneLake establishes an intuitive data hub where information is organized, indexed, and readily accessible. This unified environment ensures that data remains versatile, interoperable, and future-proof, empowering organizations to harness its full potential and derive meaningful insights.

But Microsoft Fabric doesn’t stop there. It introduces Co-pilot, an AI-driven copilot application that assists users in navigating the complexities of cognitive tasks. Co-pilot is not just a standalone technology; it’s a comprehensive stack of integrated technologies that effortlessly translates complex equations and concepts into natural language expressions. It suggests codes and guides leveraging machine learning models, visually explores data patterns, and generates comprehensive reports. With Co-pilot, Microsoft bridges the gap between data and insights, paving the way for a transformative journey where data becomes a valuable asset, conversations yield powerful insights, and innovation knows no bounds.

Microsoft Fabric’s components, including Data Factory, Synapse Data Engineering, Synapse Data Science, Synapse Data Warehouse, and Real Time Analytics, work in perfect harmony to deliver a cohesive analytics experience. With Microsoft Fabric, businesses can finally leverage a single product that offers a cohesive experience and architecture. It’s a game-changer that allows developers to extract valuable insights from data and present them to business users effortlessly. By delivering this experience as a software-as-a-service (SaaS), Microsoft automates integration and optimization, enabling users to quickly sign up and derive significant business value in just minutes.

Image: Microsoft Fabric (Collected from Google)

Microsoft Fabric has arrived, and it’s poised to revolutionize the way businesses approach analytics. With its comprehensive suite of services, unified experience, and commitment to bridging the gap between data and insights, Microsoft Fabric is set to change the future of analytics. It’s time to embrace this transformative solution and unlock the true potential of data. The future starts now with Microsoft Fabric.

To acquire comprehensive knowledge about the various components encompassing Microsoft Fabric, presented below is an exhaustive compilation of pertinent details.

Data Factory

Data Factory provides users with a contemporary data integration experience that allows them to ingest, prepare, and modify data from various data sources, including databases, data warehouses, real-time data, and more. It also provides more than 150 connectors to cloud and on-premises data sources, drag-and-drop experiences for data transformation, and the ability to orchestrate data pipelines.

By incorporating Data Factory into Microsoft Fabric, users are also being introduced to rapid data copying capabilities to both dataflows and data pipelines. Through Fast Copy, users can swiftly transfer data between their preferred data storage systems. Importantly, Fast Copy facilitates the transfer of data to Microsoft Fabric’s Lakehouse and Data Warehouse, enabling seamless analytics.

Image: Azure Data Factory (Collected from Google)

There are two primary high-level features Data Factory implements: Dataflows and Data Pipelines.

  1. Dataflows: Dataflows provide a user-friendly and low-code interface for acquiring data from multiple sources. With a wide range of data transformations available, users can modify their data as needed. The transformed data can be loaded into different destinations, including Azure SQL databases. Dataflows can be run manually or scheduled for regular refreshes, and they can be incorporated into data pipeline orchestration processes. Created using the well-established Power Query experience, which is utilized in various Microsoft products and services, such as Excel, Power BI, and Power Platform, Dataflows allows users to perform operations like joins, aggregations, data cleansing, custom transformations, and more, all within an intuitive low-code interface.
  2. Data pipelines: Data pipelines are powerful tools for managing workflows in cloud-based platforms. They enable users to create complex workflows that facilitate data updates, move large datasets, and define control flow pipelines. With data pipelines, users can build ETL and data factory workflows capable of handling diverse tasks on a large scale. These pipelines offer control flow functionalities, allowing users to establish workflow logic with loops and conditional statements. Users can create comprehensive ETL data pipelines by combining configuration-driven copy activities, low-code dataflow refresh, and code-first activities like Spark Notebooks, SQL scripts, and stored procedures.

Synapse Data Engineering

Synapse Data Engineering offers an exceptional Spark platform that enhances the writing experience for data engineers. It empowers data engineers to carry out extensive data transformations and facilitates the democratization of data through the utilization of Lakehouse.

Image: Azure Synapse Analytics Framework (Collected from Microsoft’s website)

In the realm of data engineering, Microsoft Fabric allows users to conceive, construct, and sustain infrastructures and systems that enable their organizations to gather, store, process, and analyze substantial amounts of data. Microsoft Fabric also presents a range of data engineering capabilities to ensure that the data is easily accessible, well-organized, and of superior quality. Within the data engineering homepage, users can undertake the following actions:

  1. Create and manage their data using a lakehouse.
  2. Devise pipelines to transfer data into their lakehouse.
  3. Employ Spark Job definitions to submit batch or streaming jobs to a Spark cluster.
  4. Utilize notebooks to write code for data ingestion, preparation, and transformation.

Synapse Data Science

Microsoft Fabric provides a comprehensive suite of Data Science capabilities, empowering users to seamlessly execute end-to-end data science workflows to enrich data and derive valuable business insights. From data exploration, preparation, and cleansing to experimentation, modeling, and serving predictive insights to BI reports, users can perform a wide range of activities throughout the entire data science process.

Within Microsoft Fabric, users can access a dedicated Data Science Home page, which serves as a central hub for discovering and accessing various relevant resources. This includes the ability to create machine learning experiments, models, and notebooks, as well as import existing notebooks directly from the Data Science Home page. The overall data science process within Microsoft Fabric encompasses the following key steps:

  1. Problem formulation and ideation: Data science practitioners collaborate seamlessly with business users and analysts on the same platform, facilitating data sharing and collaboration across different roles. This integration streamlines hand-offs during the problem formulation phase, enabling smooth collaboration between stakeholders.
  2. Data pre-processing: Microsoft Fabric users can interact with data stored in OneLake using the Lakehouse feature, which seamlessly integrates with notebooks for browsing and interacting with data. Data can be easily read from OneLake into a Pandas data frame, enabling seamless data exploration. Microsoft Fabric offers powerful tools for data ingestion and data orchestration pipelines, making it straightforward to access and transform data into a format suitable for machine learning.
  3. ML modeling: With the support of tools like PySpark/Python, SparklyR/R, and notebooks, Microsoft Fabric facilitates machine learning model training. Users can leverage a variety of popular machine-learning libraries for their model training needs. The platform also integrates with MLFlow, providing a built-in experience for logging experiments and models.
  4. Operationalization: Notebooks enable batch scoring of machine-learning models using open-source libraries for prediction. Additionally, Microsoft Fabric offers a scalable universal Spark Predict function, which supports MLFlow packaged models in the platform’s model registry.
  5. Reporting: In Microsoft Fabric, predicted values can be easily written to OneLake and seamlessly consumed within Power BI reports using the Power BI Direct Lake mode. This would enable data science practitioners to effortlessly share their results with stakeholders and simplifies the operationalization process. Notebooks containing batch scoring can be scheduled to run using the platform’s scheduling capabilities, and Power BI automatically incorporates the latest predictions without the need for manual data loading or refreshing, thanks to the Direct Lake mode in Microsoft Fabric.

Through the robust capabilities and seamless integration provided by Microsoft Fabric, users can effectively navigate the entire data science workflow, derive valuable insights, and facilitate collaboration between different roles and stakeholders.

Synapse Data Warehouse

Microsoft Fabric introduces a cutting-edge data warehouse that revolves around a lake-centric approach. This enterprise-grade distributed processing engine delivers exceptional performance at scale, all while eliminating the complexities associated with configuration and management. By providing a seamless Software-as-a-Service (SaaS) experience tightly integrated with Power BI, Microsoft Fabric’s Warehouse simplifies an organization’s analytics infrastructure, bridging the gap between data lakes and warehouses. This convergence aims to streamline an organization’s investment in its analytics ecosystem.

The Warehouse in Microsoft Fabric caters to users of all skill levels, ranging from citizen developers to professional developers, database administrators (DBAs), and data engineers. The platform offers a comprehensive set of experiences within the Microsoft Fabric workspace, enabling users to expedite their time-to-insights. The other capabilities include:

  1. Virtual warehouses with cross-database querying: Microsoft Fabric is a tool that allows users to create virtual warehouses where they can gather data from different sources using shortcuts. These virtual warehouses can include data from OneLake, Azure Data Lake Storage, or other cloud storage providers. With cross-database querying capabilities, Microsoft Fabric enables users to extract valuable insights from various data sources without duplicating the data. By joining different data sources together, users can quickly and easily gain comprehensive insights that previously required significant data integration and engineering efforts.
  2. Autonomous workload management: Warehouses in Microsoft Fabric utilize an exceptional distributed query processing engine, offering customers workloads with natural isolation boundaries. The autonomous allocation and relinquishment of resources ensure optimal performance, automatic scaling, and concurrency without the need for manual adjustments. True isolation is achieved by separating workloads with distinct characteristics, preventing interference between ETL jobs and ad hoc analytics or reporting workloads.
  3. Open format for seamless engine interoperability: The Warehouse stores data in the parquet file format and publishes it as Delta Lake Logs, enabling ACID transactions and facilitating interoperability across different engines and workloads within Microsoft Fabric, eliminating the need for duplicating data to accommodate different skills of data professionals. Data engineers who are proficient in Python can effortlessly leverage the same data used by data warehouse professionals accustomed to working with SQL. Additionally, BI professionals can readily access the same data to create visually appealing and insightful visualizations in Power BI, all with exceptional performance and without data duplication.
  4. Separation of storage and compute: The Warehouse in Microsoft Fabric decouples compute and storage, enabling customers to rapidly scale their resources to meet business demands. Multiple compute engines can seamlessly read from supported storage sources, ensuring robust security and full ACID transactional guarantees.
  5. Effortless ingestion, loading, and transformation at scale: Data can also be ingested into the Warehouse through various methods such as Pipelines, Dataflows, cross-database querying, or the COPY INTO command. Once ingested, the data can be analyzed by different business groups, leveraging the functionality of cross-database querying. The Warehouse Editor provides an easy-to-use web experience for querying, while a fully integrated BI experience expedites time to insights through graphical data modeling.

Synapse Real Time Analytics

Synapse Real Time Analytics is a comprehensive managed platform for big data analytics, specifically designed for streaming and time-series data. It leverages a high-performance query language and engine to efficiently search structured, semi-structured, and unstructured data.

How is it unique?

  1. It offers a cutting-edge suite of features that set it apart from traditional analytics platforms. Users can effortlessly capture, transform, and route real-time events to various destinations, including custom applications. The platform boasts a remarkable capability to ingest and load data from any source, irrespective of the data format, enabling a seamless data integration experience.
  2. One of the standout features is the ability to perform analytical queries directly on raw data, eliminating the need for complex data modeling or scripting for data transformation. This streamlined approach empowers analysts to extract valuable insights in real time, without getting caught up in laborious preprocessing tasks.
  3. Also leverages default streaming for data import, facilitating high-performance, low-latency, and real-time data analysis. By automatically partitioning imported data based on time and hash, and applying indexing by default, the platform ensures optimized data organization and retrieval efficiency.
  4. Another highlight is its versatility in working with various data structures, including structured, semi-structured, and free text formats. This flexibility empowers analysts to effortlessly navigate and extract meaningful information from diverse data sources, catering to a wide range of analytical needs.
  5. The platform’s exceptional query capabilities deserve special mention. It allows users to query raw data directly, without the need for preprocessing, leveraging a rich set of available operators. This translates into exceptional performance and minimal response time, enabling swift and efficient analysis.
  6. In addition, it offers seamless scalability, effortlessly handling unlimited volumes of data, ranging from gigabytes to petabytes. With unlimited concurrency for queries and users, the platform ensures smooth operation even under heavy workloads, accommodating the growing data demands of the organization.
  7. With its unique and robust feature set, it empowers organizations to unlock the true potential of their data in real time, enabling data-driven decision-making, optimizing operational efficiency, and driving innovation across various domains.

Working in Real Time Analytics:

  1. Users can utilize the event stream functionality to capture, transform, and route real-time events using a user-friendly, no-code approach.
  2. Users can store and manage data in a KQL database, which enables data accessibility in OneLake and integration with other Fabric experiences.
  3. Users can also leverage the KQL query set to execute queries, view and customize query results, and save queries for future use. It also offers the users options to export and share queries with others, and the ability to generate Power BI reports.

Power BI

In the realm of data analysis and visualization, Power BI stands as an integrated suite of software services, applications, and connectors meticulously crafted to harmonize disparate data sources into cohesive, visually captivating, and interactive insights. Whether the data originates from an Excel spreadsheet or a blend of cloud-based and on-premises hybrid data repositories, Power BI seamlessly establishes connections with these sources, enabling effortless exploration, visualization, and sharing of vital information with users' desired audience.

At its core, Power BI encompasses three fundamental components, each synergistically contributing to the overall functionality:

  1. Power BI Desktop: A robust Windows desktop application meticulously engineered for data modeling, report creation, and visual design.
  2. Power BI Service: An advanced software-as-a-service (SaaS) platform hosted online, enabling users to publish, collaborate, and share reports and dashboards effortlessly.
  3. Power BI Mobile Apps: Intuitively designed applications tailored for Windows, iOS, and Android devices, empowering users to access and interact with business insights on the go.

By seamlessly integrating these three elements, Power BI empowers individuals to craft, distribute, and consume actionable business intelligence in a manner that aligns with their unique roles and preferences. To learn more about Power BI, view this link: https://medium.com/nerd-for-tech/introduction-to-microsoft-power-bi-bd5426558979

Data Activator

Data Activator, a powerful feature within Microsoft Fabric, is a game-changer when it comes to turning data into action. Designed to work seamlessly with various data types, from static data in warehouses to real-time streaming data in Azure Event Hubs, Data Activator opens up a world of possibilities for users to drive meaningful outcomes.

Implementing Data Activator is flexible and adaptable, catering to diverse business needs. Here are just a few examples of how it can be leveraged:

  1. Sales and Marketing: Stay ahead of customer payment issues by setting up alerts for sales managers when a customer falls behind on payments. This proactive approach enables timely intervention and smoother financial operations.
  2. Inventory Management: Ensure optimal inventory management by monitoring product levels in real-time. Data Activator can notify operations managers if inventory levels for a specific product are insufficient, allowing them to take prompt action and avoid stockouts.
  3. Operations: Maintain data quality standards by automatically monitoring data quality metrics. If metrics fall below the defined targets, Data Activator can trigger remedial processes, ensuring data integrity and reliability.
  4. IoT: Enhance operational efficiency in IoT environments by leveraging Data Activator to automatically generate engineering support tickets. For instance, if the temperature of a refrigerator exceeds a safe threshold, an engineering support ticket can be created instantly, enabling swift resolution.
Image: Data Activator Framework (Collected from Microsoft’s website)

So, how does Data Activator work? Let’s delve into its three-step process:

  1. Connect to data: Data Activator seamlessly integrates with a wide range of data sources within Microsoft Fabric, including Power BI datasets and Eventstreams. Once connected, it continuously monitors the data, keeping a vigilant eye out for actionable patterns that can drive meaningful insights.
  2. Detect actionable conditions: With Data Activator, users have a centralized platform to define and customize actionable patterns in their data. These patterns can range from simple threshold triggers, such as exceeding a predefined value, to complex trends over time. This flexibility allows users to tailor the system to their unique business requirements.
  3. Trigger actions: When Data Activator identifies an actionable pattern, it doesn’t stop at just detecting it. It goes a step further and triggers actions based on the defined criteria. These actions can take various forms, from sending email notifications or alerts via Teams to the relevant personnel within the organization, to initiating automated processes using Power Automate flows or other line-of-business applications. The result is a seamless and efficient workflow that transforms data insights into tangible outcomes.

That said, Data Activator empowers organizations to bridge the gap between data analysis and actionable decision-making. By harnessing its capabilities, users can unlock the full potential of their data, driving operational efficiency, improving customer experiences, and accelerating business growth.

Limitations

It is good to know that the Data Warehousing in Microsoft Fabric is currently in a preview stage, offering a range of SaaS features and functionality designed for users of all skill levels. The focus of this preview is to provide a simplified experience by using an open data format and a single copy of data. It’s also important to note that the preview release does not prioritize performance, concurrency, and scale. As the development progresses toward the General Availability of data warehousing in Microsoft Fabric, additional functionality will be introduced to enhance its performance and scalability.

Therefore, during the current preview stage, there are several limitations to be aware of:

  1. The T-SQL functionality is limited, and certain T-SQL commands have the potential to cause warehouse corruption.
  2. Warehouse recovery capabilities are not available during the preview period.
  3. Data warehousing is not supported for multiple geographies at this time. If you’re using Synapse Data Warehouse and Lakehouse items, they should not be moved to a different region during the preview stage.
    It’s important to keep these limitations in mind while using Data Warehousing in Microsoft Fabric during the preview period.

Final Thoughts

In my opinion, Microsoft Fabric is a game-changer in the realm of analytics solutions, offering a cohesive experience and architecture that enables organizations to leverage the full potential of their data. With its comprehensive suite of capabilities, including Data Factory for data integration, Synapse Data Engineering for data transformations, Synapse Data Science for end-to-end data science workflows, and Real-Time Analytics for streaming and time-series data analysis, Microsoft Fabric covers every aspect of the analytics process. By seamlessly integrating these components and providing a unified analytics experience, Microsoft Fabric could empower businesses to make data-driven decisions, optimize operational efficiency, and drive innovation.

While the current preview version may have some limitations, such as limited T-SQL functionality and the absence of warehouse recovery capabilities, it is important to remember that these issues are being addressed by Microsoft. As the platform evolves, it will be interesting to see how it further enhances its capabilities and addresses user feedback to meet the diverse needs of organizations working with data at scale. To conclude, with Microsoft Fabric, the possibilities for harnessing the power of data are truly limitless.

--

--

Jinnatul Raihan Mumu

Data analyst who loves to learn about new business intelligence tools everyday.