Power BI Dashboard in Microsoft Fabric: A Cookbook With End-to-End Development

Published in

intelia

7 min readMar 6, 2024

Microsoft Fabric is an awesome tool for crunching data and uncovering those valuable insights, no matter the size of your organization. It has everything you need: a sleek platform, seamless analytics, an easy-to-use interface, and the ability to handle various types of data. If you’re looking to level up your data analytics game, Fabric is definitely worth exploring. In this article, I’m going to walk you through how I’ve been putting Fabric to work in the real world, with a neat little example to help illustrate its power.

What is Microsoft Fabric?

If you are not familiar with Microsoft Fabric, it is the ultimate analytics hub from Microsoft, bringing together all sorts of data tasks such as warehousing, integration, real-time analytics, and business intelligence, all in one place (check out Figure 1). It’s like a big umbrella covering three main Microsoft data analytics tools: Azure Data Factory, Azure Synapse, and Power BI. Inside Fabric, there’s a core called OneLake, where all the data is stored. Before OneLake, companies had all these separate data lakes for different teams, which was a headache to manage. But OneLake swooped in to save the day, breaking down those data barriers and making teamwork easier by streamlining how organizations handle their data.

Fabric offers a range of inventive features for both data geeks and regular business folks:

Azure Data Factory facilitates the orchestration of data pipelines through intuitive interfaces.
Azure Synapse is a powerhouse that tackles various tasks: from data engineering with Spark to managing your data warehouse workload, ensuring top-notch performance and scalability using its SQL engine and Delta Lake format. It also offers real-time analytics capabilities for handling large volumes of data efficiently. Synapse even allows you to delve into data science, assisting in building AI models and seamlessly putting them into action with Azure Machine Learning and Fabric.
Power BI, integrated with Fabric, simplifies data access through Direct Lake mode. This means you can make informed decisions by running queries and generating reports directly from the data lake without unnecessary data duplication. It’s ideal for analyzing large datasets that undergo frequent updates.

2. Power BI Dashboard in Fabric: An End-to-End Development

The next parts dive into how I used the three features of Microsoft Fabric to put together a Power BI dashboard.

2.1. Step 1-Create a Lakehouse

Prior to utilizing Direct Lake, you need to set up a Lakehouse (or Warehouse) containing one or more Delta tables within a workspace hosted on a supported Power BI or Microsoft Fabric capacity. The Lakehouse is essential as it serves as the storage repository for your parquet-formatted files within OneLake. Additionally, the Lakehouse acts as an access point to initiate the Web modeling feature for creating a Direct Lake model. The creation of Lakehouse is accessible through the Data Engineering interface, as depicted below (Figure 3).

Figure 3. Create Lakehouse Via Data Engineering Interface

As part of provisioning a Lakehouse, a SQL endpoint for SQL querying and a default model for reporting are created and updated with any tables added to the Lakehouse. While Direct Lake mode doesn’t query the SQL endpoint when loading data directly from OneLake, it’s required when a Direct Lake model must seamlessly fall back to Direct Query mode, such as when the data source uses specific features like advanced security or views that can’t be read through Direct Lake. Direct Lake mode also queries the SQL endpoint for schema- and security-related information.

2.2. Step 2-Curate Data into Lakehouse

Data can be loaded into Lakehouse through different approach, such as using copy activity or notebook in data pipeline, and dataflow. Alternatively, data files can be manually uploaded into Lakehouse storage. Then in the Explorer pane on the left next to Files and click on the three dots. On the popup menu click on Upload and then Upload files. When the pane appears on the right, click on the folder icon in the input box. Once you’ve selected the file, click the Upload button (Figure 4).

Figure 4. Manual Data Upload to Lakehouse File Storage

Once the files are loaded, the following codes are used to create delta tables and load data into Lakehouse tables. If the files are in Json format, they may be required to be flattened before any further processing.

This will lead to creation of Bronze layer in Medallion Architecture as the table structures in this layer correspond to the source system table structures “as-is,” along with any additional metadata columns that capture the load date/time.

#sets the V-Order special sauce
spark.conf.set("sprk.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")
spark.conf.set("spark.sql.parquet.enableVectorizedReader","false")

#import require libraries
from pyspark.sql.types import StructType, ArrayType  
from pyspark.sql.functions import explode    
from pyspark.sql.types import *
from pyspark.sql.functions import *
import ast
import json
from pyspark.sql.functions import current_date
from pyspark.sql.functions import lit
import datetime
import json

#functions to flatten the Spark Dataframe
def flatten(schema, prefix=None):
    fields = []
    for field in schema.fields:
        name = prefix + '.' + field.name if prefix else field.name
        dtype = field.dataType
        if isinstance(dtype, StructType):
            fields += flatten(dtype, prefix=name)
        else:
            fields.append(name)

    return fields


def explodeDF(df):
    for (name, dtype) in df.dtypes:
        if "array" in dtype:
            df = df.withColumn(name, explode(name))

    return df

def df_is_flat(df):
    for (_, dtype) in df.dtypes:
        if ("array" in dtype) or ("struct" in dtype):
            return False

    return True

def flatJson(jdf):
    keepGoing = True
    while(keepGoing):
        fields = flatten(jdf.schema)
        new_fields = [item.replace(".", "_") for item in fields]
        jdf = jdf.select(fields).toDF(*new_fields)
        jdf = explodeDF(jdf)
        if df_is_flat(jdf):
            keepGoing = False

    return jdf
def loadDataFromSource(source_folder, table_name):
    """
        Function to save each source folder to a Delta table
        Args:
            source_folder (str): source folder name
            table_name    (str): table name
    """
    #df now is a Spark DataFrame containing JSON data from file.
    #it may be required to flatten the DataFrame


    df = spark.read.json(f"""Files/{source_folder}/{table_name}""", multiLine=True)

    #flatten the Spark Dataframe
    flat_df = flatJson(df)

    #add audit columns
    flat_df= flat_df.withColumn("AuditLoadDateTime", lit(str(datetime.datetime.now()))) \  
              .withColumn("AuditSourceFileName",lit(source_folder+"/"+table_name)  )

    #Create/append data to the lakecouse tables
    flat_df.write.mode("overwrite").format("delta").option("mergeSchema", "true").save("Tables/" + table_name)


#load json list of tables
table_list = json.loads(source_table_list)
#create new empty list
table_result_list = []

#loop over json object and save name of table to list
for json_dict in table_list:
    table_result_list.append(json_dict['name'])

#load the data into Lakehouse tables
for i in table_result_list:
    loadDataFromSource(source_folder,i)

source_table_list and source_folder are two parameter that will be fed into the notebook within the data pipeline.

Figure 5 shows the data pipeline for loading the data files from Lakehouse file storage into Lakehouse tables. “Set variable” activity will set the root folder of Lakehouse fil storage, and “Get Metadata” activity finds all the existing files in the root folder and feeds them as input to notebook to be loaded.

Next step is to curate data into Silver (cleansed and conformed data) and Gold Lakehouse tables (curated business-level tables) (Figure 6 and Figure 7). Microsoft Fabric also offers the ability to utilize Spark notebooks, enabling data cleansing and applying business-level logics through either Spark SQL or PySpark code (Figure 6).

Figure 6. Data Curation End-to-End Pipeline

Figure 7. Curated Data in Lakehouse Tables

2.3. Step 3: Build a Semantic Data Model in SQL Analytics Endpoint

Once the Gold Lakehouse data is curated, a semantic data model can be created by setting the relations between the tables (Figure 8).

2.4. Step 4: Build Power BI Dashboard Using Direct Lake Mode

It is now time to create a Direct Lake to your Lakehouse semantic data model and start reporting. In Power BI desktop, click on OneLake data hub and then Power BI semantic models and select the semantic model created in previous step (Figure 9).

Figure 9. Connecting to Lakehouse Semantic Data Model

The semantic model will be loaded into the Power BI and you can start building your dashboard (Figure 10).

3- Limitations

There are a few limitations when working with Microsoft Fabric. Some of these limitations are listed below:

Development limitations:

-Lack of key vault integration, specifically outside of notebooks.
-Some of the data types are not supported.
-Tables with renamed columns aren’t supported in the SQL analytics endpoint.
-Delta tables created outside of the “/Tables” folder aren’t available in the SQL analytics endpoint.
- Data Factory Limitations; pipelines don’t support Continuous Integration and Continuous Delivery (CI/CD).
-The only possible constraints are “not null” and “check”. Primary or unique keys are not supported yet.

Power BI Dashboard in Microsoft Fabric: A Cookbook With End-to-End Development

Written by Tara Pourhabibi