Converting nested JSON structures to Pandas DataFrames

Derek Mortensen
May 10, 2020 · 4 min read

The Problem

APIs and document databases sometimes return nested JSON objects and you’re trying to promote some of those nested keys into column headers but loading the data into pandas gives you something like this:

df = pd.DataFrame.from_records(results["issues"], columns=["key", "fields"])
Image for post

The problem is that the API returned a nested JSON structure and the keys that we care about are at different levels in the object.

Image for post

What we want is this:

Image for post

An example

We’re going to use data returned from the Jira API as an example. The Jira API often includes metadata about fields. Let’s say these are the fields we care about

  • key: Within the list of issue records this is a top-level JSON key so no problem pulling it into its own column.
  • summary: This is at the second level, inside the “fields” object.
  • status name: What we care about is the name of the status but it also includes fields for the status description, status id, etc. This is at the 3rd level.
  • issuetype: Again, what we care about here is the name not any of the metadata.
  • statusCategory name: This is actually a sub-object included in the status field for each issue returned by the API. What we care about is the name of the statusCategory, which is at the 4th nested level.

Example of data returned by the Jira API. Note that the fields we want to extract (bolded) are at 4 different levels in the JSON structure inside the issues list.

{
"expand": "schema,names",
"issues": [
{
"fields": {
"issuetype": {
"avatarId": 10300,
"description": "",
"id": "10005",
"name": "New Feature",
"subtask": False
},
"status": {
"description": "A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.",
"id": "5",
"name": "Resolved",
"statusCategory": {
"colorName": "green",
"id": 3,
"key": "done",
"name": "Done",
}
},
"summary": "Recovered data collection Defraglar $MFT problem"
},
"id": "11861",
"key": "CAE-160",
},
{
"fields": {
... more issues],
"maxResults": 5,
"startAt": 0,
"total": 160
}

A not-so-good solution

One option would be to write some code that goes in and looks for a specific field but then you have to call this function for each nested field that you’re interested in and .apply it to a new column in the DataFrame.

First, we would extract the objects inside the fields key up to columns:

df = (
df["fields"]
.apply(pd.Series)
.merge(df, left_index=True, right_index = True)
)
Image for post

Now we have the summary, but issue type, status, and status category are still buried in nested objects. Here’s a way to extract the issue type name.

# Extract the issue type name to a new column called "issue_type"df_issue_type = (
df["issuetype"]
.apply(pd.Series)
.rename(columns={"name": "issue_type_name"})["issue_type_name"]
)
df = df.assign(issue_type_name = df_issue_type)
Image for post

We could move this code into a function that took in the parent object name, key that we are looking forand new column name but would still need to call this for each field that we want. Not ideal.

A built-in solution, .json_normalize to the rescue

Thanks to the folks at pandas we can use the built-in .json_normalize function. From the pandas documentation:

Normalize[s] semi-structured JSON data into a flat table.

All that code above turns into 3 lines.

  • Identify the fields we care about using . notation for nested objects.
  • Pass our list of issues to .json_normalize
  • Filter the columns
FIELDS = ["key", "fields.summary", "fields.issuetype.name", "fields.status.name", "fields.status.statusCategory.name"]df = pd.json_normalize(results["issues"])df[FIELDS]
Image for post

Some extras

Record Path

Instead of passing in the list of issues with results["issues"] we can use the record_path argument and specify the path to the issue list in the JSON object.

# Use record_path instead of passing the list contained in results["issues"]pd.json_normalize(results, record_path="issues")[FIELDS]

Custom Separator

To separate column names with something other than the default . use the separgument.

# Separate level prefixes with a "-" instead of the default "."FIELDS = ["key", "fields-summary", "fields-issuetype-name", "fields-status-name", "fields-status-statusCategory-name"]pd.json_normalize(results["issues"], sep = "-")[FIELDS]

Control Recursion

If you don’t want to dig all the way down into each sub-object use the max_level argument. In this case, since the statusCategory.name field was at the 4th level in the JSON object it won't be included in the resulting DataFrame.

# Only recurse down to the second level
pd.json_normalize(results, record_path="issues", max_level = 2)

References

The Startup

Medium's largest active publication, followed by +756K people. Follow to join our community.

Derek Mortensen

Written by

I write about programming, statistics and tech. Studied electrical engineering (B.Sc., M.Sc.) and I work in software.

The Startup

Medium's largest active publication, followed by +756K people. Follow to join our community.

Derek Mortensen

Written by

I write about programming, statistics and tech. Studied electrical engineering (B.Sc., M.Sc.) and I work in software.

The Startup

Medium's largest active publication, followed by +756K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store