Pivot Tables 101: Part One
90 seconds to your first pivot table in Excel…
If you are new to the basic functionality and uses of spreadsheet software like Excel, this series is meant to provide you with a few quick examples to get you comfortable with loading data, cleaning data, and transforming data into meaningful insights. This is meant to be no frills, rapid instruction, so lets jump in.
Today I’m using a data set entitled “COVID-19 Community Vulnerability Crosswalk — Rank Ordered by Score” from HealthData.gov. Loading a data set like this can be completed if a few simple ways.
- Directly open your downloaded “.csv” file with the Excel program. This is the quickest way, but you aren’t guaranteed to have the data in the format you desired (text data types might be interpreted as numeric, dates as integer, Etc.).
- Import your data within the your spreadsheet application. For Excel, you accomplish this through the following: In a new workbook select the Data Tab > Select the “Get Data From Text/CSV” > select the “.csv” file of interest and follow the prompts to import your data. The benefit here is that you can select “Transform Your Data” prior to loading to ensure you load it the way intended.
Deriving Rapid Insight From Data:
Once the data is loaded, one of the first things to do is verify that the data makes sense. Under the “Total Score” column there are a number of “Manual Verification Required” cells. These are essentially “Null” data fields and should be treated as so. There is also missing data in the “Low Income Area (LIA) Census Tract (Poverty Percentage)” column. But how do we know exactly how much is missing in excel? One way to do it is to utilize a PivotTable.
- Click on a cell inside the data table and select the “Insert” Tab, and then “Pivot Table”. A pivot table is simple a way to aggregate, or group your data to quickly gain insights.
- Add a pivot table to a new worksheet. It should look something like this:
In this case, I want to see the total “Null” or “NaN” in my data set for a few columns. To do this, I am going to start by adding “Total Score” to Rows and “Total Score” to the “Sum Values” box. The end result is something like this:
I’m not interested in seeing all the variables in my data, only the “Not Null” and “Null” which will be a count of the “Manual Verification Required” cells. To do this, I simply grouped the data:
And renamed it:
To change to a percentage, I changed the data type in the “Sum Values” box of the PivotTable Fields:
Instead of selecting out of the “Summarize Values By” list, click on the “Show Values As” tab and select “% of Column Total” or whatever calculation you are interested in. Now you have the percentage of the Total Score column that is “Null”
Rinse and Repeat with other columns of interest.
Excel is an excellent tool for quickly manipulating small data sets and getting some general insight into those data sets. Unfortunately, compared to Python or R, Excel is tedious and slow for the majority of data science tasks. Despite this, given the pervasiveness of spreadsheet use in most business applications, it’s worth having more than a general understanding of how to generate these types of insights quickly and efficiently within these applications.
In Pivot tables 101: Part Two, we will dig into how to link multiple charts and manipulate them in a dashboard. Till next time, Cheers!