Microsoft Fabric Notebooks: Essential PySpark Functions and Commands
What are Microsoft Fabric Notebooks?
The Microsoft Fabric notebook is a tool for developing Apache Spark jobs and machine learning experiments. It serves as a web-based interactive environment where data scientists and data engineers can write code, utilizing rich visualizations and Markdown text. Data engineers use notebooks for tasks such as data ingestion, preparation, and transformation. Data scientists leverage them to build machine learning solutions, including creating experiments, models, tracking, and deployment.
With a Fabric notebook, you can:
- Start with zero setup effort.
- The Spark engine in Microsoft Fabric starts up quickly, enabling rapid execution of data processing and machine learning tasks.
- Easily explore and process data through an intuitive low-code experience.
- Maintain data security with built-in enterprise security features.
- Use various languages like PySpark, Spark (Scala), Spark R, and Spark SQL.
- Analyze data across raw formats (CSV, text, JSON, etc.) and processed file formats (parquet, Delta Lake, etc.) using powerful Spark capabilities.
- Enhance productivity with advanced authoring capabilities and built-in data visualization.
- Microsoft Fabric notebooks have excellent integration with Lakehouse, allowing you to use relative paths and work seamlessly as if in a local environment
As per Microsoft Fabric documentation
“A Microsoft Fabric notebook is a primary code item for developing Apache Spark jobs and machine learning experiments. It’s a web-based interactive surface used by data scientists and data engineers to write code benefiting from rich visualizations and Markdown text.”
You can also find a video on the same topic.
How to create a notebook?
The best way to open a notebook is from the Lakehouse. However, you can also open it from your workspace and choose the Lakehouse:
- Go to the left pane and select “Workspace.”
- Choose your workspace.
- In the top left, click “New,” then “More.”
- Under “Data Engineering,” select “Notebooks.”
From Lakehouse:
- Go to the left pane and select “Workspace.”
- Choose your workspace.
- Select “Lakehouse Explorer.”
- In the top menu, click “Open Notebook.”
- Open a new or existing notebook.
In this blog, we will focus on some essential functions and commands needed for data analysis. These include understanding your data, joining DataFrames, grouping, and analyzing DataFrame data.
We will use a file from my GitHub account: Sales Data for Fabric.xlsx. This file is specifically designed with no spaces in the column headers, allowing you to save it directly to Lakehouses without renaming.
In Microsoft Fabric, you don’t need to explicitly call to start the session. Once the session is started, you can simply use it with Spark. This code you might not see in our example
from pyspark.sql import SparkSession
# Start Spark session
spark = SparkSession.builder \
.appName("Microsoft Fabric Data Analysis") \
.getOrCreate()
In my case, I am using Lake02 under 01-GA-Fabric. I created a notebook using the “Create Notebook” option. Once the notebook opened, I clicked on its name at the top left and renamed it to “Important PySpark Cmds.”
In the first cell, I have added a code to read the GitHub and load data into pandas data frames
import pandas as pd
excel_file_path = "https://github.com/amitchandakpbi/powerbi/raw/main/Sales%20Data%20for%20Fabric.xlsx"
# Use pandas to read the Excel file
df_sales = pd.read_excel(excel_file_path, sheet_name="Sales")
df_customer = pd.read_excel(excel_file_path, sheet_name="Customer")
df_geo = pd.read_excel(excel_file_path, sheet_name="Geography")
df_item = pd.read_excel(excel_file_path, sheet_name="Item")
This PySpark code is used to read data from an Excel file hosted online and load it into different DataFrames using the panda's library:
- Importing pandas: The first line of code starts by importing the
pandas
library. - Defining the Excel file path: The
excel_file_path
variable stores the URL of the Excel file. This file contains various sheets with sales-related data. - Reading the ‘Sales’ sheet: Using
pd.read_excel()
, the code reads the 'Sales' sheet from the Excel file and stores it in thedf_sales
DataFrame. - Reading the ‘Customer’ sheet: Similarly, it reads the ‘Customer’ sheet and stores it in the
df_customer
DataFrame. - Reading the ‘Geography’ sheet: The ‘Geography’ sheet is read and stored in the
df_geo
DataFrame. - Reading the ‘Item’ sheet: Finally, the ‘Item’ sheet is read and stored in the
df_item
DataFrame.
After I ran the code, since this was the first run, it started the Apache Spark session and loaded the data into DataFrames.
Checked data by printing the head of df_sales
print(df_sales.head())
Most of the operations I wanted to perform are on Spark DataFrames, so I converted all the data into Spark DataFrames.
sales = spark.createDataFrame(df_sales)
customer = spark.createDataFrame(df_customer)
geography = spark.createDataFrame(df_geo)
item = spark.createDataFrame(df_item)
The code converts the panda's DataFrames into PySpark DataFrames:
- Converting ‘Sales’ DataFrame: The
df_sales
pandas DataFrame is converted into a PySpark DataFrame namedsales
using thespark.createDataFrame()
function. - Converting ‘Customer’ DataFrame: Similarly, the
df_customer
pandas DataFrame is converted into a PySpark DataFrame namedcustomer
. - Converting ‘Geography’ DataFrame: The
df_geo
pandas DataFrame is converted into a PySpark DataFrame namedgeography
. - Converting ‘Item’ DataFrame: Finally, the
df_item
pandas DataFrame is converted into a PySpark DataFrame nameditem
.
We will now use the display
function.
In PySpark, the display
function is used to visualize the results of DataFrame operations. It is especially useful in environments such as Databricks notebooks or Microsoft Fabric, where it helps in quickly rendering data in a tabular format, generating charts, and providing an interactive way to explore data. Here's a summary of its functionality:
- Tabular Display: Renders the DataFrame in a table format, making it easy to inspect the data visually.
- Charts: Allows for quick generation of visualizations such as bar charts, line charts, scatter plots, etc.
- Interactive Exploration: Provides tools to filter, sort, and explore the data interactively.
- Summary Statistics: Displays summary statistics of the DataFrame, such as mean, median, and standard deviation, for quick insights into the data.
display(sales)
Tabular Display
Click on the Chart icon at the top left. You have various options to customize and explore different types of charts.
Change to bar chart and apply
Microsoft Fabric and Databricks, the display
function offers various chart customization options to enhance the visualization of your data. These customization options allow you to tailor the appearance and behavior of your charts to better suit your analysis needs. Here are some common chart customization options available:
- Chart Type: Choose from various chart types such as bar, line, pie, scatter, area, and more.
- Key: Select the column to be used for the categorical-axis. Customize the axis title, scale, and sorting order.
- Values: Select one or more columns for the value axis. Customize the axis title, scale, aggregation method (sum, average, count, etc.), and sorting order.
- Series Grouping: Group data by specific columns to create stacked or grouped charts. This is useful for comparing different categories or groups within your data.
- Aggregation: Choose how to aggregate data points (e.g., sum, average, count) to better represent the data in the chart.
We can display and analyze other tables as well.
You can also use the show function to look at the sample data. In PySpark, the show
function is used to display the contents of a DataFrame in a tabular format. It is a simple and convenient way to inspect a few rows of data and verify the structure and content of your DataFrame. The show
function prints the specified number of rows from the DataFrame to the console, along with the column names.
# Show the DataFrames
sales.describe().show()
geography.describe().show()
customer.describe().show()
item.describe().show()
In the above code, we have also used the describe()
function.
In PySpark, the describe
function is used to compute basic statistical summaries of numerical columns in a DataFrame. The describe
function provides useful metrics such as count, mean, standard deviation, minimum, and maximum values for each numerical column. This is helpful for gaining a quick understanding of the distribution and summary statistics of your data.
Output Columns
- count: The number of non-null entries for each column.
- mean: The average value of each column.
- stddev: The standard deviation of each column.
- min: The minimum value in each column.
- max: The maximum value in each column.
We can also use the display
function to show the data from the describe()
function.
display(sales.describe())
Now, let’s use the summary()
function. In PySpark, the summary
function is used to compute a comprehensive set of summary statistics for the columns in a DataFrame. It provides more detailed statistics compared to the describe
function. The summary
function includes additional measures such as percentiles and can be customized to include specific statistics. By default, it computes count
, mean
, stddev
, min
, max
, 25%
, 50%
, and 75%
. You can also specify other statistics like skewness
and kurtosis
.
Output Columns
- count: The number of non-null entries for each column.
- mean: The average value of each column.
- stddev: The standard deviation of each column.
- min: The minimum value in each column.
- 25%: The 25th percentile value in each column.
- 50%: The median (50th percentile) value in each column.
- 75%: The 75th percentile value in each column.
- max: The maximum value in each column.
display(geography.summary())
display(customer.summary())
Now, we want to learn about joins and how to join two DataFrames. But before that, let’s create two smaller DataFrames using the filter()
and select()
functions.
In PySpark, the select
function is used to select a subset of columns from a DataFrame. It allows you to specify one or more columns to be included in the resulting DataFrame. This function is very useful when you want to focus on specific columns or create a new DataFrame with only the columns you need for further analysis.The select
function is a fundamental tool in PySpark for shaping and manipulating DataFrames, allowing you to focus on the specific data you need for your analysis.
In PySpark, the filter
function is used to filter rows in a DataFrame based on a given condition or set of conditions. It allows you to subset the DataFrame by specifying a condition that each row must satisfy. The filter
function is equivalent to the where
function in PySpark.
- Condition-Based Filtering: The
filter
function allows you to subset the DataFrame based on conditions specified using column expressions or SQL-like syntax. - Logical Operators: You can combine multiple conditions using logical operators such as
&
(and),|
(or), and~
(not). - String Operations: You can filter based on string operations like
startswith
,endswith
, andcontains
.
The filter
function is an essential tool in PySpark for data manipulation and cleaning, allowing you to focus on specific subsets of your data based on various conditions.
df_customer_1_to_60 = customer.select("CustomerId").filter(customer.CustomerId.between(1, 60))
df_customer_40_to_100 = customer.select("CustomerId").filter(customer.CustomerId.between(40, 100))
Display the filtered data.
In PySpark, the join
function is used to combine two DataFrames based on a common column or condition. It is similar to SQL joins and allows you to perform operations such as inner join, left join, right join, and outer join to merge datasets. The join
function is crucial for combining data from different sources or tables based on shared keys.
Parameters
- other: The other DataFrame to join with.
- on: A string or a list of column names to join on, or a join expression. If not specified, the join will be performed on all common columns.
- how: The type of join to perform. Options include:
"inner"
: Default. Returns rows with matching keys in both DataFrames."outer"
: Returns all rows from both DataFrames, with nulls where there is no match."left"
: Returns all rows from the left DataFrame, with nulls in the right DataFrame where there is no match."right"
: Returns all rows from the right DataFrame, with nulls in the left DataFrame where there is no match."Anti"
: An anti join returns only the rows from the left DataFrame that do not have a matching row in the right DataFrame."left_anti"
: A left_anti join returns only the rows from the left DataFrame that do not have a matching row in the right DataFrame.
The join
function is a powerful tool in PySpark for combining datasets based on shared keys or conditions, enabling comprehensive data analysis and transformation.
The below code performs a left join on two PySpark DataFrames and then displays the resulting DataFrame.
Performing a Left Join:
df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left")
:- This line joins two DataFrames:
df_customer_1_to_60
anddf_customer_40_to_100
. - The join is performed on the column
CustomerId
. - The type of join used is a left join, meaning all records from the left DataFrame (
df_customer_1_to_60
) are included, and matched records from the right DataFrame (df_customer_40_to_100
) are included where they match theCustomerId
. If there is no match, the result will havenull
values for columns from the right DataFrame.
Storing the Result:
- The result of the join operation is stored in the variable
left_join_df
.
Printing a Message:
print("Left Join DataFrame:")
prints a message indicating that the following output will be the result of the left join.
Displaying the Result:
left_join_df.show()
displays the contents of theleft_join_df
DataFrame. Theshow()
function prints the DataFrame to the console, showing a default number of rows (usually 20) and truncating columns that are too wide.
Left Join: In this join, non-matching data from the left DataFrame will also be included. Therefore, we will get customerId from 1 to 39, along with matching IDs from 40 to 60.
left_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left")
print("Left Join DataFrame:")
left_join_df.show()
We can show more rows by specifying a number in the show()
function. The left_join_df.show(100)
function is used to display the first 100 rows of the left_join_df
DataFrame.
left_join_df.show(100)
Right Join: In this join, non-matching data from the right DataFrame will also be included. Therefore, we will get customerId from 61 to 100, along with matching IDs from 40 to 60.
# Right Join
right_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "right")
print("Right Join DataFrame:")
right_join_df.show(100)
Full Join: In this join, non-matching data from both the left and right DataFrames will be included. Therefore, we will get customerId from 1 to 39 from the left, 61 to 100 from the right, along with matching IDs from 40 to 60. This means all 100 IDs will be included.
# Full Join
full_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "outer")
print("Full Join DataFrame:")
full_join_df.show(200)
The below code performs a left anti join on two PySpark DataFrames and then displays the resulting DataFrame. Here’s a detailed explanation:
Performing a Left Anti Join:
df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left_anti")
:- This line joins two DataFrames:
df_customer_1_to_60
anddf_customer_40_to_100
. - The join is performed on the column
CustomerId
. - The type of join used is a left anti join, meaning it returns only the rows from the left DataFrame (
df_customer_1_to_60
) that do not have a match in the right DataFrame (df_customer_40_to_100
) based on theCustomerId
.
Anti Join or Left Anti Join: The data from the left DataFrame that is not present in the right DataFrame will be returned. In this case, we will get customerId from 1 to 39.
# Left Anti Join
left_anti_join_df = df_customer_1_to_60.join(df_customer_40_to_100, "CustomerId", "left_anti")
print("Left Anti Join DataFrame:")
left_anti_join_df.show(100)
Anti Join or Left Anti Join: Since there is no right anti join, we swapped the tables. The data from the left DataFrame that is not present in the right DataFrame will be returned. In this case, we will get customerId from 61 to 100.
# No Right Anti Join
right_anti_join_df = df_customer_40_to_100.join(df_customer_1_to_60, "CustomerId", "anti")
print("Right Anti Join DataFrame:")
right_anti_join_df.show(100)
Let’s take better examples. For that, we will create new DataFrames from Sales and Customer. From Sales, we will only consider IDs from 59 to 62, and from Customer, we will take IDs from 50 to 60. With only two overlapping IDs, this will allow us to have blank data from a table where there is no matching data. This means that when we do a left join, we will see blank data from the right-side table. We have not removed any columns, which will help in a better understanding of the joins.
The below code filters rows in the sales
and customer
DataFrames based on the CustomerId
column.
Filtering the ‘sales’ DataFrame:
sales.filter(sales.CustomerId.between(59, 62))
:- This line filters the
sales
DataFrame to include only the rows where theCustomerId
is between 59 and 62 (inclusive). - The
between(59, 62)
method is used to specify the range ofCustomerId
values to include. - The filtered rows are stored in the new DataFrame
sales_filter
.
Filtering the ‘customer’ DataFrame:
customer.filter(customer.CustomerId.between(50, 60))
:- Similarly, this line filters the
customer
DataFrame to include only the rows where theCustomerId
is between 50 and 60 (inclusive). - The
between(50, 60)
method is used to specify the range ofCustomerId
values to include. - The filtered rows are stored in the new DataFrame
customer_filter
.
sales_filter = sales.filter(sales.CustomerId.between(59, 62))
customer_filter = customer.filter(customer.CustomerId.between(50, 60))
Now let’s try all the joins again.
left_join_df = sales_filter.join(customer_filter, "CustomerId", "anti")
print("Left Join DataFrame:")
left_join_df.show(1000)
Left Join: Non-matching data from Sales along with matched data is included, and the Customer table shows null for the non-matching rows from Sales. All rows from Sales are included, and matching rows from Customer are displayed.
left_join_df = sales_filter.join(customer_filter, "CustomerId", "left")
print("Left Join DataFrame:")
left_join_df.show(1000)
Right Join: Non-matching data from Customer along with matched data is included, and the Sales table shows null for the non-matching rows from Customer. All rows from Customer are included, and matching rows from Sales are displayed.
left_join_df = sales_filter.join(customer_filter, "CustomerId", "roght")
print("Left Join DataFrame:")
left_join_df.show(1000)
Full Join: Non-matching data from both Sales and Customer along with matched data is included. The Sales table shows null for the non-matching rows from Customer and vice versa.
left_join_df = sales_filter.join(customer_filter, "CustomerId", "full")
print("Left Join DataFrame:")
left_join_df.show(1000)
Anti Join: Only data that is present in Sales and not in Customer is shown, based on the join of Customer ID.
left_join_df = sales_filter.join(customer_filter, "CustomerId", "anti")
print("Left Join DataFrame:")
left_join_df.show(1000)
I want to create a combined DataFrame. Before that, I want to drop the ‘City’ and ‘State’ columns from the Customer DataFrame.
The below code drops the ‘State’ and ‘City’ columns from the customer
DataFrame and then displays the updated DataFrame.
Dropping Columns:
customer.drop('State', 'City')
:- This line removes the columns ‘State’ and ‘City’ from the
customer
DataFrame. - The
drop
method is used to specify the columns to be removed.
Updating the DataFrame:
- The updated DataFrame, without the ‘State’ and ‘City’ columns, is stored back in the
customer
variable.
customer = customer.drop('State', 'City')
display(customer)
Now let’s combine all four DataFrames into one. However, I will create additional intermediate DataFrames for reference.
The below code performs a series of join operations to combine multiple DataFrames into a single DataFrame named sales_all
. Here's a step-by-step explanation:
Joining sales
and customer
DataFrames:
sales.join(customer, "CustomerId")
:- This line joins the
sales
andcustomer
DataFrames on theCustomerId
column. - The resulting DataFrame, which includes data from both
sales
andcustomer
, is stored in thesales_customer
variable.
Joining the Result with geography
DataFrame:
sales_customer.join(geography, "CityId")
:- This line joins the
sales_customer
DataFrame with thegeography
DataFrame on theCityId
column. - The resulting DataFrame, which now includes data from
sales
,customer
, andgeography
, is stored in thesales_customer_geo
variable.
Performing the Final Join with item
DataFrame:
sales_customer_geo.join(item, sales.ItemID == item.ItemId, "inner")
:- This line performs an inner join between the
sales_customer_geo
DataFrame and theitem
DataFrame. - The join is done on the condition that
sales.ItemID
matchesitem.ItemId
. - The resulting DataFrame, which includes data from
sales
,customer
,geography
, anditem
, is stored in thesales_all
variable.
sales_customer = sales.join(customer, "CustomerId")
sales_customer_geo = sales_customer.join(geography, "CityId")
# Perform the join operation
sales_all = sales_customer_geo.join(item, sales.ItemID == item.ItemId, "inner")
# Show the result of the join
print("Inner Join DataFrame:")
sales_all.show()
#display(sales_all)
I would like to add new calculated columns to the DataFrames.
The below code adds calculated columns to the sales_all
DataFrame using PySpark's SQL functions. Here's a detailed explanation:
Importing Required Functions:
from pyspark.sql.functions import col, expr
:- The
col
function is imported to reference DataFrame columns in expressions. - The
expr
function allows for more complex expressions, though it is not used in this particular snippet.
Adding Calculated Columns:
Calculating Gross:
sales_all.withColumn("Gross", col("Qty") * col("Price"))
:- This line calculates the gross revenue by multiplying the
Qty
(quantity) column with thePrice
column. - The result is stored in a new column named
Gross
. - The DataFrame with the new column is reassigned to
sales_all
.
Calculating COGS (Cost of Goods Sold):
sales_all.withColumn("COGS", col("Qty") * col("Cost"))
:- This line calculates the cost of goods sold by multiplying the
Qty
(quantity) column with theCost
column. - The result is stored in a new column named
COGS
. - The DataFrame with the new column is reassigned to
sales_all
.
Calculating Discount:
sales_all.withColumn("Discount", col("Qty") * col("Price") * col("DiscountPercent"))
:- This line calculates the total discount amount by multiplying the
Qty
(quantity) column, thePrice
column, and theDiscountPercent
column. - The result is stored in a new column named
Discount
. - The DataFrame with the new column is reassigned to
sales_all
.
from pyspark.sql.functions import col, expr
# Adding calculated columns
sales_all = sales_all.withColumn("Gross", col("Qty") * col("Price"))
sales_all = sales_all.withColumn("COGS", col("Qty") * col("Cost"))
sales_all = sales_all.withColumn("Discount", col("Qty") * col("Price") * col("DiscountPercent"))
# Display the DataFrame with new columns
sales_all.show()
If you need a list of columns, you can use the columns
function.
The print(sales_all.columns)
command prints the list of column names in the sales_all
DataFrame. This is useful for verifying the structure of the DataFrame and ensuring that the new calculated columns have been added correctly.
print(sales_all.columns)
The time has come to group the data and analyze it.
The below code groups the sales_all
DataFrame by City
and State
columns and calculates aggregate sums for Gross
, COGS
, and Discount
. Here's a detailed explanation:
Grouping and Aggregating Data:
sales_all.groupBy("City", "State")
:- This line groups the
sales_all
DataFrame by theCity
andState
columns. - The
groupBy
method is used to specify the columns by which the data should be grouped.
Calculating Aggregate Sums:
.agg(expr("sum(Gross) as TotalGross"), expr("sum(COGS) as TotalCOGS"), expr("sum(Discount) as TotalDiscount"))
:- The
agg
method is used to perform aggregate calculations on the grouped data. expr("sum(Gross) as TotalGross")
calculates the sum of theGross
column for each group and renames the result toTotalGross
.- Similarly,
expr("sum(COGS) as TotalCOGS")
calculates the sum of theCOGS
column and renames it toTotalCOGS
. expr("sum(Discount) as TotalDiscount")
calculates the sum of theDiscount
column and renames it toTotalDiscount
.
# Grouping by City and State
grouped_city_state = sales_all.groupBy("City", "State").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("sum(Discount) as TotalDiscount")
)
grouped_city_state.show()
Try more group by combinations.
# Grouping by Brand and Category
grouped_brand_category = sales_all.groupBy("Brand", "Category").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("avg(Qty) as TotalQty"),
expr("sum(Discount) as TotalDiscount")
)
grouped_brand_category.show()
# Grouping by State and Category
grouped_state_category = sales_all.groupBy("State", "Category").agg(
expr("sum(Gross) as TotalGross"),
expr("sum(COGS) as TotalCOGS"),
expr("sum(Discount) as TotalDiscount")
)
grouped_state_category.show()
Let’s try groupby()
and display()
to analyze the data.
# Aggregate data by State and City
map_df = sales_all.groupBy("State", "Category").agg(
expr("sum(Gross) as TotalGross")
)
# Display as a map
display(map_df)
Bar Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category
Column Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category
Column Chart: Key: State; Values: TotalGross; Aggregation: Sum; Series Group: Category
Word Cloud: Word Column- State; Frequency Column- TotalGross
Now, let’s save the tables using the saveAsTable
and save
functions.
The below snippet saves the sales
DataFrame as a Delta table. Here’s a detailed explanation:
Specifying the Format:
sales.write.format("delta")
:- This line specifies that the
sales
DataFrame should be written using the Delta format. Delta Lake is a storage layer that brings reliability to data lakes.
Saving as a Table:
.saveAsTable("sales_delta")
:- This method saves the
sales
DataFrame as a managed table namedsales_delta
in the Delta format.
sales.write.format("delta").saveAsTable("sales_delta")
The below code saves the geography
DataFrame as a Delta table in a specified location, overwriting any existing data. Here’s a detailed explanation:
Specifying the Format:
geography.write.format("delta")
:- This line specifies that the
geography
DataFrame should be written using the Delta format. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
Setting the Mode:
.mode("overwrite")
:- This specifies that if there is any existing data at the target location, it should be overwritten. This is useful for updating the table with new data.
Saving the Data:
.save("Tables/geography")
:- This saves the
geography
DataFrame to the specified locationTables/geography
. This can be a path in a file system (like HDFS, S3, or a local file system).
geography.write.format("delta").mode("overwrite").save("Tables/geography")
You can now find the tables in the Lakehouse.
In conclusion, Microsoft Fabric Notebooks provide a robust and versatile environment for data engineers and data scientists to conduct data analysis and machine learning tasks. With features like easy setup, integration with Lakehouse, support for multiple programming languages, and powerful data visualization capabilities, Fabric Notebooks streamline the workflow from data ingestion to analysis. By leveraging these essential functions and commands, you can enhance your productivity, ensure data security, and achieve more efficient data processing and analysis.
I hope you liked all the functions and commands. Let me know what else you would like me to cover in the next video on YouTube and blog on Medium.
Also, Refer
Complete Power BI in one Video in 11 hours
Mastering Power BI: 230+ Videos
Expertise Power BI: 150+ Videos
Power BI 50 Interview Questions and 10 Advanced Use Cases
My Medium blogs can be found here if you are interested
Click here to access all my blogs and videos in a jiffy via an exclusive glossary using Power BI. Please like, share, and comment on these blogs. I would appreciate your suggestions for improvement, challenges, and suggestions for topics so that I can explore these in more depth.
In addition, I have over 750 videos on my YouTube channel that are related to Power BI, Tableau, and Incorta. With the help of these videos, you will learn hundreds of DAX and Power Query functions, in addition to hundreds of Power BI, use cases that will assist you in becoming an expert in Power BI. Make sure you subscribe, like, and share it with your friends.
Master Power BI
Expertise in Power BI
Power BI For Tableau User
DAX for SQL Users
Learn SQL
Don’t forget to subscribe to
and join our Power BI community