Microsoft Fabric Notebooks: Essential PySpark Functions and Commands

Amit Chandak
Microsoft Power BI
Published in
20 min readJun 28, 2024

What are Microsoft Fabric Notebooks?

The Microsoft Fabric notebook is a tool for developing Apache Spark jobs and machine learning experiments. It serves as a web-based interactive environment where data scientists and data engineers can write code, utilizing rich visualizations and Markdown text. Data engineers use notebooks for tasks such as data ingestion, preparation, and transformation. Data scientists leverage them to build machine learning solutions, including creating experiments, models, tracking, and deployment.

With a Fabric notebook, you can:

  • Start with zero setup effort.
  • The Spark engine in Microsoft Fabric starts up quickly, enabling rapid execution of data processing and machine learning tasks.
  • Easily explore and process data through an intuitive low-code experience.
  • Maintain data security with built-in enterprise security features.
  • Use various languages like PySpark, Spark (Scala), Spark R, and Spark SQL.
  • Analyze data across raw formats (CSV, text, JSON, etc.) and processed file formats (parquet, Delta Lake, etc.) using powerful Spark capabilities.
  • Enhance productivity with advanced authoring capabilities and built-in data visualization.
  • Microsoft Fabric notebooks have excellent integration with Lakehouse, allowing you to use relative paths and work seamlessly as if in a local environment

As per Microsoft Fabric documentation

“A Microsoft Fabric notebook is a primary code item for developing Apache Spark jobs and machine learning experiments. It’s a web-based interactive surface used by data scientists and data engineers to write code benefiting from rich visualizations and Markdown text.”

You can also find a video on the same topic.

How to create a notebook?

The best way to open a notebook is from the Lakehouse. However, you can also open it from your workspace and choose the Lakehouse:

  1. Go to the left pane and select “Workspace.”
  2. Choose your workspace.
  3. In the top left, click “New,” then “More.”
  4. Under “Data Engineering,” select “Notebooks.”

From Lakehouse:

  1. Go to the left pane and select “Workspace.”
  2. Choose your workspace.
  3. Select “Lakehouse Explorer.”
  4. In the top menu, click “Open Notebook.”
  5. Open a new or existing notebook.

In this blog, we will focus on some essential functions and commands needed for data analysis. These include understanding your data, joining DataFrames, grouping, and analyzing DataFrame data.

We will use a file from my GitHub account: Sales Data for Fabric.xlsx. This file is specifically designed with no spaces in the column headers, allowing you to save it directly to Lakehouses without renaming.

In Microsoft Fabric, you don’t need to explicitly call to start the session. Once the session is started, you can simply use it with Spark. This code you might not see in our example

from pyspark.sql import SparkSession

# Start Spark session
spark = SparkSession.builder \
.appName("Microsoft Fabric Data Analysis") \
.getOrCreate()

In my case, I am using Lake02 under 01-GA-Fabric. I created a notebook using the “Create Notebook” option. Once the notebook opened, I clicked on its name at the top left and renamed it to “Important PySpark Cmds.”

In the first cell, I have added a code to read the GitHub and load data into pandas data frames

import pandas as pd
excel_file_path = "https://github.com/amitchandakpbi/powerbi/raw/main/Sales%20Data%20for%20Fabric.xlsx"

# Use pandas to read the Excel file
df_sales = pd.read_excel(excel_file_path, sheet_name="Sales")
df_customer = pd.read_excel(excel_file_path, sheet_name="Customer")
df_geo = pd.read_excel(excel_file_path, sheet_name="Geography")
df_item = pd.read_excel(excel_file_path, sheet_name="Item")

This PySpark code is used to read data from an Excel file hosted online and load it into different DataFrames using the panda's library:

  1. Importing pandas: The first line of code starts by importing the pandas library.
  2. Defining the Excel file path: The excel_file_path variable stores the URL of the Excel file. This file contains various sheets with sales-related data.
  3. Reading the ‘Sales’ sheet: Using pd.read_excel(), the code reads the 'Sales' sheet from the Excel file and stores it in the df_sales DataFrame.
  4. Reading the ‘Customer’ sheet: Similarly, it reads the ‘Customer’ sheet and stores it in the df_customer DataFrame.
  5. Reading the ‘Geography’ sheet: The ‘Geography’ sheet is read and stored in the df_geo DataFrame.
  6. Reading the ‘Item’ sheet: Finally, the ‘Item’ sheet is read and stored in the df_item DataFrame.

After I ran the code, since this was the first run, it started the Apache Spark session and loaded the data into DataFrames.

Checked data by printing the head of df_sales

print(df_sales.head())

Most of the operations I wanted to perform are on Spark DataFrames, so I converted all the data into Spark DataFrames.

sales = spark.createDataFrame(df_sales)
customer = spark.createDataFrame(df_customer)
geography = spark.createDataFrame(df_geo)
item = spark.createDataFrame(df_item)

The code converts the panda's DataFrames into PySpark DataFrames:

  1. Converting ‘Sales’ DataFrame: The df_sales pandas DataFrame is converted into a PySpark DataFrame named sales using the spark.createDataFrame() function.
  2. Converting ‘Customer’ DataFrame: Similarly, the df_customer pandas DataFrame is converted into a PySpark DataFrame named customer.
  3. Converting ‘Geography’ DataFrame: The df_geo pandas DataFrame is converted into a PySpark DataFrame named geography.
  4. Converting ‘Item’ DataFrame: Finally, the df_item pandas DataFrame is converted into a PySpark DataFrame named item.

We will now use the displayfunction.

In PySpark, the display function is used to visualize the results of DataFrame operations. It is especially useful in environments such as Databricks notebooks or Microsoft Fabric, where it helps in quickly rendering data in a tabular format, generating charts, and providing an interactive way to explore data. Here's a summary of its functionality:

  1. Tabular Display: Renders the DataFrame in a table format, making it easy to inspect the data visually.
  2. Charts: Allows for quick generation of visualizations such as bar charts, line charts, scatter plots, etc.
  3. Interactive Exploration: Provides tools to filter, sort, and explore the data interactively.
  4. Summary Statistics: Displays summary statistics of the DataFrame, such as mean, median, and standard deviation, for quick insights into the data.
display(sales)

Tabular Display

Click on the Chart icon at the top left. You have various options to customize and explore different types of charts.

Change to bar chart and apply

Microsoft Fabric and Databricks, the display function offers various chart customization options to enhance the visualization of your data. These customization options allow you to tailor the appearance and behavior of your charts to better suit your analysis needs. Here are some common chart customization options available:

  1. Chart Type: Choose from various chart types such as bar, line, pie, scatter, area, and more.
  2. Key: Select the column to be used for the categorical-axis. Customize the axis title, scale, and sorting order.
  3. Values: Select one or more columns for the value axis. Customize the axis title, scale, aggregation method (sum, average, count, etc.), and sorting order.
  4. Series Grouping: Group data by specific columns to create stacked or grouped charts. This is useful for comparing different categories or groups within your data.
  5. Aggregation: Choose how to aggregate data points (e.g., sum, average, count) to better represent the data in the chart.

We can display and analyze other tables as well.

You can also use the show function to look at the sample data. In PySpark, the show function is used to display the contents of a DataFrame in a tabular format. It is a simple and convenient way to inspect a few rows of data and verify the structure and content of your DataFrame. The show function prints the specified number of rows from the DataFrame to the console, along with the column names.

# Show the DataFrames
sales.describe().show()
geography.describe().show()
customer.describe().show()
item.describe().show()

In the above code, we have also used the describe() function.

In PySpark, the describe function is used to compute basic statistical summaries of numerical columns in a DataFrame. The describe function provides useful metrics such as count, mean, standard deviation, minimum, and maximum values for each numerical column. This is helpful for gaining a quick understanding of the distribution and summary statistics of your data.

Output Columns

  • count: The number of non-null entries for each column.
  • mean: The average value of each column.
  • stddev: The standard deviation of each column.
  • min: The minimum value in each column.
  • max: The maximum value in each column.

We can also use the display function to show the data from the describe() function.

display(sales.describe())

Now, let’s use the summary() function. In PySpark, the summary function is used to compute a comprehensive set of summary statistics for the columns in a DataFrame. It provides more detailed statistics compared to the describe function. The summary function includes additional measures such as percentiles and can be customized to include specific statistics. By default, it computes count, mean, stddev, min, max, 25%, 50%, and 75%. You can also specify other statistics like skewness and kurtosis.

Output Columns

  • count: The number of non-null entries for each column.
  • mean: The average value of each column.
  • stddev: The standard deviation of each column.
  • min: The minimum value in each column.
  • 25%: The 25th percentile value in each column.
  • 50%: The median (50th percentile) value in each column.
  • 75%: The 75th percentile value in each column.
  • max: The maximum value in each column.
display(geography.summary())
display(customer.summary())

Now, we want to learn about joins and how to join two DataFrames. But before that, let’s create two smaller DataFrames using the filter() and select() functions.

In PySpark, the select function is used to select a subset of columns from a DataFrame. It allows you to specify one or more columns to be included in the resulting DataFrame. This function is very useful when you want to focus on specific columns or create a new DataFrame with only the columns you need for further analysis.The select function is a fundamental tool in PySpark for shaping and manipulating DataFrames, allowing you to focus on the specific data you need for your analysis.

In PySpark, the filter function is used to filter rows in a DataFrame based on a given condition or set of conditions. It allows you to subset the DataFrame by specifying a condition that each row must satisfy. The filter function is equivalent to the where function in PySpark.

  • Condition-Based Filtering: The filter function allows you to subset the DataFrame based on conditions specified using column expressions or SQL-like syntax.
  • Logical Operators: You can combine multiple conditions using logical operators such as & (and), | (or), and ~ (not).
  • String Operations: You can filter based on string operations like startswith, endswith, and contains.

The filter function is an essential tool in PySpark for data manipulation and cleaning, allowing you to focus on specific subsets of your data based on various conditions.

df_customer_1_to_60 = customer.select("CustomerId").filter(customer.CustomerId.between(1, 60))
df_customer_40_to_100 = customer.select("CustomerId").filter(customer.CustomerId.between(40, 100))

Display the filtered data.

In PySpark, the join function is used to combine two DataFrames based on a common column or condition. It is similar to SQL joins and allows you to perform operations such as inner join, left join, right join, and outer join to merge datasets. The join function is crucial for combining data from different sources or tables based on shared keys.

Parameters

  • other: The other DataFrame to join with.
  • on: A string or a list of column names to join on, or a join expression. If not specified, the join will be performed on all common columns.
  • how: The type of join to perform. Options include:
  • "inner": Default. Returns rows with matching keys in both DataFrames.
  • "outer": Returns all rows from both DataFrames, with nulls where there is no match.
  • "left": Returns all rows from the left DataFrame, with nulls in the right DataFrame where there is no match.
  • "right": Returns all rows from the right DataFrame, with nulls in the left DataFrame where there is no match.
  • "Anti": An anti join returns only the rows from the left DataFrame that do not have a matching row in the right DataFrame.
  • "left_anti": A left_anti join returns only the rows from the left DataFrame that do not have a matching row in the right DataFrame.

The join function is a powerful tool in PySpark for combining datasets based on shared keys or conditions, enabling comprehensive data analysis and transformation.

The below code performs a left join on two PySpark DataFrames and then displays the resulting DataFrame.

Performing a Left Join:

  • df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left"):
  • This line joins two DataFrames: df_customer_1_to_60 and df_customer_40_to_100.
  • The join is performed on the column CustomerId.
  • The type of join used is a left join, meaning all records from the left DataFrame (df_customer_1_to_60) are included, and matched records from the right DataFrame (df_customer_40_to_100) are included where they match the CustomerId. If there is no match, the result will have null values for columns from the right DataFrame.

Storing the Result:

  • The result of the join operation is stored in the variable left_join_df.

Printing a Message:

  • print("Left Join DataFrame:") prints a message indicating that the following output will be the result of the left join.

Displaying the Result:

  • left_join_df.show() displays the contents of the left_join_df DataFrame. The show() function prints the DataFrame to the console, showing a default number of rows (usually 20) and truncating columns that are too wide.

Left Join: In this join, non-matching data from the left DataFrame will also be included. Therefore, we will get customerId from 1 to 39, along with matching IDs from 40 to 60.

left_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left")
print("Left Join DataFrame:")
left_join_df.show()

We can show more rows by specifying a number in the show() function. The left_join_df.show(100) function is used to display the first 100 rows of the left_join_df DataFrame.

left_join_df.show(100)

Right Join: In this join, non-matching data from the right DataFrame will also be included. Therefore, we will get customerId from 61 to 100, along with matching IDs from 40 to 60.

# Right Join
right_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "right")
print("Right Join DataFrame:")
right_join_df.show(100)

Full Join: In this join, non-matching data from both the left and right DataFrames will be included. Therefore, we will get customerId from 1 to 39 from the left, 61 to 100 from the right, along with matching IDs from 40 to 60. This means all 100 IDs will be included.

# Full Join
full_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "outer")
print("Full Join DataFrame:")
full_join_df.show(200)

The below code performs a left anti join on two PySpark DataFrames and then displays the resulting DataFrame. Here’s a detailed explanation:

Performing a Left Anti Join:

  • df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left_anti"):
  • This line joins two DataFrames: df_customer_1_to_60 and df_customer_40_to_100.
  • The join is performed on the column CustomerId.
  • The type of join used is a left anti join, meaning it returns only the rows from the left DataFrame (df_customer_1_to_60) that do not have a match in the right DataFrame (df_customer_40_to_100) based on the CustomerId.

Anti Join or Left Anti Join: The data from the left DataFrame that is not present in the right DataFrame will be returned. In this case, we will get customerId from 1 to 39.

# Left Anti Join
left_anti_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left_anti")
print("Left Anti Join DataFrame:")
left_anti_join_df.show(100)

Anti Join or Left Anti Join: Since there is no right anti join, we swapped the tables. The data from the left DataFrame that is not present in the right DataFrame will be returned. In this case, we will get customerId from 61 to 100.

# No Right Anti Join
right_anti_join_df = df_customer_40_to_100.join(df_customer_1_to_60, "CustomerId", "anti")
print("Right Anti Join DataFrame:")
right_anti_join_df.show(100)

Let’s take better examples. For that, we will create new DataFrames from Sales and Customer. From Sales, we will only consider IDs from 59 to 62, and from Customer, we will take IDs from 50 to 60. With only two overlapping IDs, this will allow us to have blank data from a table where there is no matching data. This means that when we do a left join, we will see blank data from the right-side table. We have not removed any columns, which will help in a better understanding of the joins.

The below code filters rows in the sales and customer DataFrames based on the CustomerId column.

Filtering the ‘sales’ DataFrame:

  • sales.filter(sales.CustomerId.between(59, 62)):
  • This line filters the sales DataFrame to include only the rows where the CustomerId is between 59 and 62 (inclusive).
  • The between(59, 62) method is used to specify the range of CustomerId values to include.
  • The filtered rows are stored in the new DataFrame sales_filter.

Filtering the ‘customer’ DataFrame:

  • customer.filter(customer.CustomerId.between(50, 60)):
  • Similarly, this line filters the customer DataFrame to include only the rows where the CustomerId is between 50 and 60 (inclusive).
  • The between(50, 60) method is used to specify the range of CustomerId values to include.
  • The filtered rows are stored in the new DataFrame customer_filter.
sales_filter = sales.filter(sales.CustomerId.between(59, 62))
customer_filter = customer.filter(customer.CustomerId.between(50, 60))

Now let’s try all the joins again.

left_join_df = sales_filter.join(customer_filter, "CustomerId", "anti")
print("Left Join DataFrame:")
left_join_df.show(1000)

Left Join: Non-matching data from Sales along with matched data is included, and the Customer table shows null for the non-matching rows from Sales. All rows from Sales are included, and matching rows from Customer are displayed.

left_join_df = sales_filter.join(customer_filter, "CustomerId", "left")
print("Left Join DataFrame:")
left_join_df.show(1000)

Right Join: Non-matching data from Customer along with matched data is included, and the Sales table shows null for the non-matching rows from Customer. All rows from Customer are included, and matching rows from Sales are displayed.

left_join_df = sales_filter.join(customer_filter, "CustomerId", "roght")
print("Left Join DataFrame:")
left_join_df.show(1000)

Full Join: Non-matching data from both Sales and Customer along with matched data is included. The Sales table shows null for the non-matching rows from Customer and vice versa.

left_join_df = sales_filter.join(customer_filter, "CustomerId", "full")
print("Left Join DataFrame:")
left_join_df.show(1000)

Anti Join: Only data that is present in Sales and not in Customer is shown, based on the join of Customer ID.

left_join_df = sales_filter.join(customer_filter, "CustomerId", "anti")
print("Left Join DataFrame:")
left_join_df.show(1000)

I want to create a combined DataFrame. Before that, I want to drop the ‘City’ and ‘State’ columns from the Customer DataFrame.

The below code drops the ‘State’ and ‘City’ columns from the customer DataFrame and then displays the updated DataFrame.

Dropping Columns:

  • customer.drop('State', 'City'):
  • This line removes the columns ‘State’ and ‘City’ from the customer DataFrame.
  • The drop method is used to specify the columns to be removed.

Updating the DataFrame:

  • The updated DataFrame, without the ‘State’ and ‘City’ columns, is stored back in the customer variable.
customer = customer.drop('State', 'City')
display(customer)

Now let’s combine all four DataFrames into one. However, I will create additional intermediate DataFrames for reference.

The below code performs a series of join operations to combine multiple DataFrames into a single DataFrame named sales_all. Here's a step-by-step explanation:

Joining sales and customer DataFrames:

  • sales.join(customer, "CustomerId"):
  • This line joins the sales and customer DataFrames on the CustomerId column.
  • The resulting DataFrame, which includes data from both sales and customer, is stored in the sales_customer variable.

Joining the Result with geography DataFrame:

  • sales_customer.join(geography, "CityId"):
  • This line joins the sales_customer DataFrame with the geography DataFrame on the CityId column.
  • The resulting DataFrame, which now includes data from sales, customer, and geography, is stored in the sales_customer_geo variable.

Performing the Final Join with item DataFrame:

  • sales_customer_geo.join(item, sales.ItemID == item.ItemId, "inner"):
  • This line performs an inner join between the sales_customer_geo DataFrame and the item DataFrame.
  • The join is done on the condition that sales.ItemID matches item.ItemId.
  • The resulting DataFrame, which includes data from sales, customer, geography, and item, is stored in the sales_all variable.
sales_customer = sales.join(customer, "CustomerId")
sales_customer_geo = sales_customer.join(geography, "CityId")
# Perform the join operation
sales_all = sales_customer_geo.join(item, sales.ItemID == item.ItemId, "inner")

# Show the result of the join
print("Inner Join DataFrame:")
sales_all.show()
#display(sales_all)

I would like to add new calculated columns to the DataFrames.

The below code adds calculated columns to the sales_all DataFrame using PySpark's SQL functions. Here's a detailed explanation:

Importing Required Functions:

  • from pyspark.sql.functions import col, expr:
  • The col function is imported to reference DataFrame columns in expressions.
  • The expr function allows for more complex expressions, though it is not used in this particular snippet.

Adding Calculated Columns:

Calculating Gross:

  • sales_all.withColumn("Gross", col("Qty") * col("Price")):
  • This line calculates the gross revenue by multiplying the Qty (quantity) column with the Price column.
  • The result is stored in a new column named Gross.
  • The DataFrame with the new column is reassigned to sales_all.

Calculating COGS (Cost of Goods Sold):

  • sales_all.withColumn("COGS", col("Qty") * col("Cost")):
  • This line calculates the cost of goods sold by multiplying the Qty (quantity) column with the Cost column.
  • The result is stored in a new column named COGS.
  • The DataFrame with the new column is reassigned to sales_all.

Calculating Discount:

  • sales_all.withColumn("Discount", col("Qty") * col("Price") * col("DiscountPercent")):
  • This line calculates the total discount amount by multiplying the Qty (quantity) column, the Price column, and the DiscountPercent column.
  • The result is stored in a new column named Discount.
  • The DataFrame with the new column is reassigned to sales_all.
from pyspark.sql.functions import col, expr

# Adding calculated columns
sales_all = sales_all.withColumn("Gross", col("Qty") * col("Price"))
sales_all = sales_all.withColumn("COGS", col("Qty") * col("Cost"))
sales_all = sales_all.withColumn("Discount", col("Qty") * col("Price") * col("DiscountPercent"))

# Display the DataFrame with new columns
sales_all.show()

If you need a list of columns, you can use the columns function.

The print(sales_all.columns) command prints the list of column names in the sales_all DataFrame. This is useful for verifying the structure of the DataFrame and ensuring that the new calculated columns have been added correctly.

print(sales_all.columns)

The time has come to group the data and analyze it.

The below code groups the sales_all DataFrame by City and State columns and calculates aggregate sums for Gross, COGS, and Discount. Here's a detailed explanation:

Grouping and Aggregating Data:

  • sales_all.groupBy("City", "State"):
  • This line groups the sales_all DataFrame by the City and State columns.
  • The groupBy method is used to specify the columns by which the data should be grouped.

Calculating Aggregate Sums:

  • .agg(expr("sum(Gross) as TotalGross"), expr("sum(COGS) as TotalCOGS"), expr("sum(Discount) as TotalDiscount")):
  • The agg method is used to perform aggregate calculations on the grouped data.
  • expr("sum(Gross) as TotalGross") calculates the sum of the Gross column for each group and renames the result to TotalGross.
  • Similarly, expr("sum(COGS) as TotalCOGS") calculates the sum of the COGS column and renames it to TotalCOGS.
  • expr("sum(Discount) as TotalDiscount") calculates the sum of the Discount column and renames it to TotalDiscount.
# Grouping by City and State
grouped_city_state = sales_all.groupBy("City", "State").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("sum(Discount) as TotalDiscount")
)
grouped_city_state.show()

Try more group by combinations.

# Grouping by Brand and Category
grouped_brand_category = sales_all.groupBy("Brand", "Category").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("avg(Qty) as TotalQty"),
expr("sum(Discount) as TotalDiscount")
)
grouped_brand_category.show()
# Grouping by State and Category
grouped_state_category = sales_all.groupBy("State", "Category").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("sum(Discount) as TotalDiscount")
)
grouped_state_category.show()

Let’s try groupby() and display() to analyze the data.

# Aggregate data by State and City
map_df = sales_all.groupBy("State", "Category").agg(
expr("sum(Gross) as TotalGross")
)

# Display as a map
display(map_df)

Bar Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category

Column Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category

Column Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category

Word Cloud: Word Column- State; Frequency Column- TotalGross

Now, let’s save the tables using the saveAsTable and save functions.

The below snippet saves the sales DataFrame as a Delta table. Here’s a detailed explanation:

Specifying the Format:

  • sales.write.format("delta"):
  • This line specifies that the sales DataFrame should be written using the Delta format. Delta Lake is a storage layer that brings reliability to data lakes.

Saving as a Table:

  • .saveAsTable("sales_delta"):
  • This method saves the sales DataFrame as a managed table named sales_delta in the Delta format.
sales.write.format("delta").saveAsTable("sales_delta")

The below code saves the geography DataFrame as a Delta table in a specified location, overwriting any existing data. Here’s a detailed explanation:

Specifying the Format:

  • geography.write.format("delta"):
  • This line specifies that the geography DataFrame should be written using the Delta format. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

Setting the Mode:

  • .mode("overwrite"):
  • This specifies that if there is any existing data at the target location, it should be overwritten. This is useful for updating the table with new data.

Saving the Data:

  • .save("Tables/geography"):
  • This saves the geography DataFrame to the specified location Tables/geography. This can be a path in a file system (like HDFS, S3, or a local file system).
geography.write.format("delta").mode("overwrite").save("Tables/geography")

You can now find the tables in the Lakehouse.

In conclusion, Microsoft Fabric Notebooks provide a robust and versatile environment for data engineers and data scientists to conduct data analysis and machine learning tasks. With features like easy setup, integration with Lakehouse, support for multiple programming languages, and powerful data visualization capabilities, Fabric Notebooks streamline the workflow from data ingestion to analysis. By leveraging these essential functions and commands, you can enhance your productivity, ensure data security, and achieve more efficient data processing and analysis.

I hope you liked all the functions and commands. Let me know what else you would like me to cover in the next video on YouTube and blog on Medium.

Also, Refer

Complete Power BI in one Video in 11 hours

Mastering Power BI: 230+ Videos

Expertise Power BI: 150+ Videos

Power BI 50 Interview Questions and 10 Advanced Use Cases

My Medium blogs can be found here if you are interested

Click here to access all my blogs and videos in a jiffy via an exclusive glossary using Power BI. Please like, share, and comment on these blogs. I would appreciate your suggestions for improvement, challenges, and suggestions for topics so that I can explore these in more depth.

In addition, I have over 750 videos on my YouTube channel that are related to Power BI, Tableau, and Incorta. With the help of these videos, you will learn hundreds of DAX and Power Query functions, in addition to hundreds of Power BI, use cases that will assist you in becoming an expert in Power BI. Make sure you subscribe, like, and share it with your friends.

Master Power BI
Expertise in Power BI
Power BI For Tableau User
DAX for SQL Users
Learn SQL

Don’t forget to subscribe to

👉 Power BI Publication

👉 Power BI Newsletter

and join our Power BI community

👉 Power BI Masterclass

--

--

Amit Chandak
Microsoft Power BI

Amit has 20+ years of experience in Business Intelligence, Analytics, Data Science. He is Chief Analytics Officer at Kanerika & Power BI community Super User