Data Analytics & Visualizations — Replicating the Tableau Example In Excel — Part 2
Transforming The Dataset With Power Query
In Part 1, the Extraction of 1.6 million records of data from the web into the Excel Data Model was explained. Yes, 1.6 million records.
In Part 2, we are going to focus our efforts on TRANSFORMING the data — using Excel’s Power Query tool — into information that we can analyze and use to develop the visualizations we want.
Transforming The Data
1 — Remove Extraneous Fields from the Dataset
- Select just the columns I want to keep.
- Click on the “Remove Columns >> Remove Other Columns”.
- What’s left is a table of data containing just the fields I want to keep.
2 — Eliminate Records (Rows) That Contain “Bad” Data
Per the original article:
“In Tableau, filters can eliminate unnecessary, incomplete, or erroneous records, or restrict the data used in the analysis to records meeting certain specified criteria. In this example, two filters are added: one using the OUTNCCS field and the other using the RULEDATE field.”
Same applies to Power Query in Excel. So let’s perform the same steps as demonstrated in the article.
OUTNCCS Field
“ In this example, there are 3,142 records that are considered out of scope because the entity either is not a 501(c)(3) or is a foreign entity, or for some other reason.”
Here is the filter in Power Query, unchecking records shown as “OUT”:
RULEDATE Field
“… and the earliest ruling letter date is zero, which suggests that the ruling date is missing or incorrect for these records.”
Here is the filter in Power Query, unchecking any records with a RULEDATE of 0:
Verify Final Record Count
“ Filtering out values of zero changes the minimum ruling date to 190001 (i.e., January 1900), and the remaining number of records in the data set is now 1,566,248.”
Here is the view of final filtered record count in Power Query:
3 — Data Joining
Per the original article:
“ When analyzing data, it is frequently necessary or desirable to join two or more data sets. This is difficult to efficiently or easily do when using spreadsheets, but in Tableau it is relatively easy. For example, the sample data set includes a field labeled “NTEE1,” which is a code applied to each record that classifies the charitable entity according to one of the 26 groups representing charitable industries. It does not, however, include a field that provides the name of the industry associated with the code. Producing that information requires joining the original data file to another table that contains the code and names. The authors created an Excel file for this purpose, available with the online version of this article.”
Joining data sets is easy to do using the Data Model in Excel and establishing relationships.
Here is what I did:
Create Data Table with NTEE Code and NTEE Description
The authors in the original article created a separate Excel file for this purpose. This is not necessary.
Fortunately, the NCCS provides a link to a full list of NTEE Codes, which I could query, modify and import a data table into my data model.
Connect the Data Tables (Merge Queries Approach)
To connect the two data tables, I am going to use the Power Query “Merge Queries” approach to accomplish what a VLOOKUP function would do in Excel. (1)
(1) This data connection could also be accomplished by creating a relationship between the two table in the Data model — I will demonstrate that process in a future post.
- Convert Queries to “Connection Only” — this removes them from the Data Model (we will load to the Data Model once we have created the Merged Query)
- Now, we will Merge the queries to add the NTEE Description to the import table (replicates a VLOOKUP in Excel). Open the first query in Edit mode.
- Select the Expand icon and select just the NTEEDescription field to add to table
- Then, we want to load the revised Import Table (fNCCSImport) to the data model. Right-click the Query, select “Load To” and check the “Add this data to the Data Model”.
- The data table is now ready for initial analysis!
4 — Data Pivoting (Analysis ready…)
Per the original article:
“ When using a spreadsheet to analyze data, users will often develop pivot tables to summarize and report information derived from the more detailed underlying data. While useful, Excel’s pivot tables are limited in several ways that Tableau is not.”
The authors are correct — we will use pivot tables to develop analysis — but they are incorrect in the limitations they list, which you will see below.
Let’s Create the First Analysis
Open Power Pivot >> Manage Data Model.
Select the Pivot Table Option — notice the numerous analysis options available!
In this case, I am going to create a basic pivot table — using the “fNCCSImport” data table previously added to the Data Model.
Here is a basic Pivot Table showing the NTEE Description and the Count of each type of organization.
At this point, our data has been shaped consistent with the original article — but I’m not done yet.
In the next post — Part 3 — I am going to continue to revise our data set to provide additional analysis flexibility…then we will dive into the analysis of the data!
About Don
“It’s time for DIFFERENT”
Don is passionate about helping professionals and organizations keep up, develop and adapt to the changing business world that we operate in.