Parsing JSON data in Excel
Easily enriching publicly available data (and really any other kind of data)
[Disclaimer: the Excel-JSON import offered by the Power Query Excel add-in is only available for Windows — upvote this UserVoice feature request so the team prioritizes adding the PowerQuery Excel add-in for Mac.]
Data wrangling is a necessary step when working with publicly available data and enriching/preparing said data for consumption by apps, services, or even visualizing it directly with libraries such as D3.js. Online services such as the Microsoft Azure Maps service API or the Google Maps API can help to enrich location data by either geocoding, or reverse-geocoding depending on the data available, and format needed. In my case, I had a very simple dataset that had (among many) a column for the name of a US city, and another for the US state for each row.
My goal: enrich each row in the dataset with the latitude and longitude to be able to use D3.js more efficiently.
- Postman app — super easy to create and format requests to web APIs.
- VS Code — my favorite lightweight editor
- Excel — the ultimate application to work with smaller datasets by using pivot tables, look-ups, etc.
- Microsoft Azure Maps API — used to query each city and state in order to obtain the address, geographic coordinates, and other useful pieces of data, such as county, state initials, etc.
Getting the data from Azure Maps using their API
[As a prerequisite, you need an Azure subscription — you can get some free credit when signing up for Azure]
The first step was to create an Azure Maps account from inside the Azure Portal and obtain an API key. You can see how to do it in the Azure Maps documentation here.
With the Azure Maps API key, I used the Postman app to create a few sample requests and ensure the key was working nicely. You can see how to submit them in the How to Search For an Address section in the Azure Maps docs.
Once the basics worked, I made an Excel pivot table from the original dataset to list the distinct combinations of city and state and used the preview batch request API to send a query with them. This was a better approach than querying for the same city and state hundreds, or thousands of times.
I copied the response data into VS Code and saved it as .JSON to then work with the file in Excel.
Wrangling the data in Excel (Windows only)
Open the JSON file using the Power Query add-in:
Data →Get Data →From File →From JSON
Once you select the file, you’ll see the root nodes in the JSON file. Note that these are links that you can navigate to.
The first step in the Power Query Editor is to convert the nodes into columns.
Record Tools →Convert →Into Table
They will look like this once they are converted:
Click on the item List since it contains all the records we’re interested in. The summary shows the summary of the API request; for this process it is not relevant.
We’ll do the same transforming into table at this level:
After it is converted to a table, you’ll see a small icon that lets you expand the contents into columns. Click it to see the contents of the record object and to expand it:
After expanding, the table will show a record object and a list object in each row. This is because each row represents:
- A query record (the query we did for each row, such as Paris, Texas, US, or Seattle, Washington, US).
- A list of the associated search results for the query which can contain 1 or more results depending on the request settings. In this case, to make my life easier, in the request I specified only one.
If you notice the Postman screenshot, shows the first few rows of the result — you can see how results is in fact a list of records.
With this table, we will do the exact same thing again for each objects column→ Expand to new rows.
After expanding the query summaries, we scroll right, and then expand the results objects column:
This expansion will change the object from list to record because each list has only one record:
We expand this again:
After expanding, we’ll see data, plus some more objects:
From this table, we will expand the address objects once more to get the actual addresses, and the position object to get the latitude and longitude too:
At this point, we’ll see the actual latitude and longitude. Whew!
Load the data into the Excel worksheet
Once we’ve expanded the necessary objects (and if you want, hid unnecessary columns), we can save and load the table into Excel:
Once you load the data into the Excel sheet, it will be ready to use:
From this point on you can filter the data, create a pivot table, or do any other thing that you need to refine the dataset that you’re working with.
This guide covered some basic steps to import a JSON file into excel and how to expand the contents to see the actual data returned by the Azure Maps API.
I loved using the API directly — it took me in total about 20 minutes to geocode about ~4,000 distinct addresses, which I then added to my main dataset.
The steps shown here worked well for my needs, but be aware that there are other ways to do this using Python, or even using other services, such as Google Maps. Depending on your specific needs you might need to fine tune the API request, or filter specific fields returned by the API.
With this wrangling step complete, you should be able to import your data into D3.js or other application and start to work on the visualization.