I received an email last week from a CPA client of mine who asked me…
“I have a REALLY large text file I downloaded from the IRS website listing exempt organizations. Is there a way I can get this into Excel and use it?”
It’s 2018. Every accountant should ABSOLUTELY UNDERSTAND how to approach this simple data management challenge!
There are 4 basic steps to consider (consider the “ETLR” process):
Extract and Transform
- Download or connect to the dataset.
- Import the data into Excel and “shape” it.
- Load the final data into Excel.
- Prepare analysis.
I looked at the dataset and quickly figured out that the text file download is >76 megabytes. This means is is LIKELY going to have more than 1,048,576 and I won’t be able to load it into an Excel worksheet.
No problem. And we will do this in LESS THAN 10 MINUTES!
Stepping Through The Process (less than 10 minutes!)
EXTRACT The Dataset
TIME — 1 minute, 30 seconds (01:30)
- Go to the IRS website download page.
- Download the ZIP file and extract the .txt file.
TRANSFORM the Dataset (using Excel’s Power Query Tool)
TIME — 5 minutes (05:00)
- Open Microsoft Excel (I’m using Excel 2016) and connect to .txt file to import it.
- Data layout screen — select “Transform”.
- Power Query editor opens — start transformation process.
- Completed “basic” transformation. Time to “Load” the data.
LOAD The Data (Into Excel’s “Data Model”)
TIME — less than 1 minute (01:00)
- I will load this data into the Data Model — this loads your data into the “background” in Excel. I AM NOT restricted on number of records since I am not loading it into the Excel worksheet (i.e., I won’t lose any data).
- 1,067,173 records are loaded into the Data Model. Had I loaded this into the Excel worksheet, over 20,000 records would be dropped.
REPORT On The Data (Develop Analysis)
TIME — 2 minutes (02:00)
As shown above, to start analysis, simply “Insert” a Pivot Table and Excel automatically expects that you will use the data contained in the Data Model and makes that selection for you.
Then just start getting the information you want!
- In the pivot table below, I have indicated the COUNT of exempt entities in Ohio, by City.
If this is new to you, it may be confusing.
But in 2018, this is a very BASIC dataset and Excel data conversion process.
If you want to look at the Excel file, you can download it here.
Now, you now longer have to say “I don’t know…”