Data Modeling with Power Pivot — Getting Started
MS Excel has been undoubtedly the most popular spreadsheet program around, and due to its popularity, it has been a go-to tool for data analysts. Modern Excel (especially Microsoft 365) is full of useful features that make data analysis and visualization super easy.
Since 2013, MS Excel comes with some powerful data transformation and modeling tools. These features are known as Power Query & Power Pivot. Both are quite feature-loaded and relatively easy to learn.
This article is about how to get started with these two features. We can begin to analyze data for just under two minutes!
What is Power Pivot?
Most of us are familiar with Pivot tables in Excel. It is a handy tool when it comes to summarizing and analyzing the data quickly.
Article: Top 5 features of Pivot table
The usual pivot tables can summarize data from a single table only. Most of the cases, our data is stored across multiple related tables. So, before summarizing and analyzing data using a pivot table, we first combine multiple tables (using VLOOKUP).
Power Pivot simplifies this step. We can connect multiple tables and then use them as our regular pivot table.
In this following example, we can quickly see the total orders(from Orders table) by the managers(from Users table) that have returned(from Returns table) by each customer segment(from Customer table)
And the best part is zero formulas.
The entire steps divided into three stages:
· Connecting with data
· Transforming the data
· Modeling the data
But, first thing first:
For this exercise, we are taking Sample Superstore Sales
We have four tables in the data model:
· Orders: Contains a list of all the orders
· Customer: Contains customer details
· Returns: Contains a list of all the orders returned
· User: Contains Region-wise Manager
Step 1: Connecting with data
Let us open a new workbook
Data > From File > From Workbook
Navigate to the file location > Select the file > Import
Since we need to bring all four sheets in the model, we check the option: Select Multiple items
Post the selection of the table, we have two options: Load and Transform
Use this option when our data does not require any transformations.
o Load: It loads all the selected data tables in the form of Excel tables
o Load To: It only creates a connection with the data source and not add any data tables in the current Excel file.
Also, check the option: Add this data to the Data Model. It loads the table into the Power Pivot data model.
From a data modeling perspective, it is advisable to select Load To instead of Load unless we need data in the form of a table.
Use this option when our data requires transformation. This option takes us to the ETL tool: Power Query.
We should always load the data to Power Query (Transform Data) first and ensure all the tables and columns are appropriately structured.
For our activity, the data table requires certain transformations; hence we select Transform Data.
Step 2: Transforming Data
When we click on Transform Data, Power Query window open on a separate window:
1. Queries: Shows list of all the data tables loaded
2. Data Preview: Shows the top 1000 rows of data from the source file
3. Query Settings: Shows the name of the query and all the transformations applied on each table (Under Applied Steps)
Applied Steps are something like a macro recording. It lists down all the actions performed on the data table, and when we hit refresh, it runs the entire steps and presents the final output
Assigning the Right Data Type
One of the critical elements of analytics is to assign the right data types for each column.
Power Query is intelligent enough to detect and assign the appropriate data type. The symbol denotes the data type, located before the column name:
Following are the meanings of each data type symbol:
To change the existing data type, click on the symbol, and select the appropriate data type:
In the current data model, I have made the following changes:
· Transform CustID as Text in Customer and Orders table
· Transform OrderID as Text in Orders and Returns table
· Transform Order Date and Ship Date as Date (from Date/Time) in the Orders table
· Promoted the first row in the Users table as a header
You may check the articles on Power Query transformation.
Once we are sure with the transformations, it is time to load the data into the Power Pivot data model
Home > Click on the down arrow on Close & Load > Close & Load To
· Only Create Connection: This will only connect with the source file and do not import the data table
· check the option for Add this data to the Data Model to load the data tables into the Power Pivot
Note: We cannot access any Excel window if the Power Query window is open.
How to enable Power Pivot add-in?
Skip this step if you have already enabled the Power Pivot Add-in
File > Options > Add-ins > Manage > COM Add-ins > Go
Select the Power Pivot for Excel > Ok
Step 3: Data Modelling
It is time to open the Power Pivot window.
Power Pivot > Manage
Alternatively, we can access it through the Data ribbon also:
It opens the Power Pivot window
Establish the relationship between the tables
Switch to the Diagram View
Using mouse drag & drop, we establish the relationship between the tables:
Manage Relationship view:
Please note that Power Pivot does not support Many-to-Many relationships. It supports only a Many-to-One and One-to-One relationship.
That’s it! Data modeling is complete.
Now, click on Home > Pivot Table:
Summarizing & Analyzing Data
Now, all we need to do is drag and drop fields from multiple tables to summarize the data:
We have data from four tables, and Power Pivot enables us to connect and analyze data in a few simple steps.
What is next?
Power Pivot has some extensive data modeling capabilities, and when combined with Power Query, it packs some serious punch. The next article will build on this, where we learn writing DAX measures (Power Pivot formula engine).