Data Analytics & Visualizations — Replicating the Tableau Example In Excel — Part 1

Extracting The Dataset Into The Excel Data Model

Don Tomoff
Let’s Excel
4 min readAug 9, 2018

--

In my previous post, I discussed where the article authors post regarding Excel was incorrect as to Excel’s limitations.

So, I’m going to do a series of posts that will take us through completing the same (or similar) steps — using Excel — to demonstrate the current capabilities of Excel to perform analytics projects many do not consider possible.

In this first step, we will source the data into Excel and start some basic data clean up. Let’s go.

If you would like to download the Excel file created from the steps below, you can access it here in a ZIP format. Since it is almost 1.6 million records, with numerous fields, it’s a large file!

Download Excel File in ZIP format here

Understanding The Data Source

The data source that the authors provided the following:

“The data set used comes from the National Center for Charitable Statistics(NCCS) website (http://nccs.urban.org/), which maintains the IRS Business Master Files (BMF) using data derived from the IRS Forms 1023 and 1024 of the over 1.5 million U.S. not-for-profit organizations that file IRS Form 990. The authors chose this data set to demonstrate Tableau’s ability to handle big data and the types of data relevant to accountants. It is worth noting that Excel could only load 1,048,576 of 1,584,587 records, leaving 536,011 records unloaded. The data fields used are shown in Exhibit 1; the data itself can be downloaded from the NCCS (https://urbn.is/2IEu5EA).”

http://nccs.urban.org/

When the data is queried, it contains many fields of information. Step 1 is to understand the data structure and NCCS provides a Data Dictionary to enable that — take a look at the most recent version here.

Querying and Importing The Dataset

The data source is provided above.

Using this link, I opened Excel (I use Office 365, Version 2016) and copied the link into Get & Transform (under “Data” tab).

  • Paste URL from website source into the dialog box. Click OK and the query begins.
Access web source via Data >> Get & Transform >> From Web
  • A preliminary view of the data appears. Notice the URL source and then click on “Transform Data” at the bottom.
Preliminary dataset view — click on Transform Data
  • Power Query opens in Edit mode. At this point, I only want to do 2 things — change the name of the query AND notice the “Applied Steps” that have started to be created (at this point, Power Query has done these by itself).
Data view in the Power Query Editor — Load to Data Model
  • Then, we load the data into via the Data Model into Excel. Since there are so many records (1,584,587), the data cannot be loaded directly into Excel. However, contrary to the reference article above, Excel can easily handle more records than rows in a spreadsheet.

By the way, the Data Model handles the data much more effectively and efficiently versus dropping it into an Excel table.

Loading the NCCS Dataset into the Excel Data Model

If you aren’t sure about all the Power Query load options, this 5 minute clip from #ExcelIsFun channel on YouTube is an excellent overview:

Data Load Options Review by ExcelIsFun (YouTube)

Next, open Queries & Connections and you can see that 1,584,587 records have been loaded into the Data Model.

Query showing all records loaded into the Data Model

Let’s open the Data Model and see what this looks like —

Open Data Model — Power Pivot Add-In must be installed to do this
Data Model — Indicates all Records Loaded

Next Steps

The next phase of this process is to shape or “transform” the dataset into what we need for our analysis.

So, the “Extract, Transform, Load, Report” sequence of analytics moves to the Transform stage.

At this point, we have 49 fields:

Field List of Imported Dataset

We will only retain 13 fields, since the others are not relevant to the analysis we are performing.

I won’t go into detail of this process, but it is necessary to review the Data field definitions to understand what each field represents. That information can be found here.

This will be explored in the next post — stay tuned!

About Don

“It’s time for DIFFERENT”

Don is passionate about helping professionals and organizations keep up, develop and adapt to the changing business world that we operate in.

“What Do You Do?”

Connect with Don!

LinkedIn, Flipboard, Twitter, Snapchat

--

--

Don Tomoff
Let’s Excel

It’s time for DIFFERENT— On a mission to challenge the status quo to a more productive and effective end… #digital #Excel #data #analytics #genai #chatgpt