flatten_json on Python Package Index (PyPI)

I wrote a blog post last year about flattening JSON objects. I find myself using it often while manipulating data and I’ve noticed that it’s getting a descent amount of hits. I decided to package it up and make it available on Python Package Index (PyPI) so it’s easier to install and use in different projects:

pip install flatten_json

Usage

Let’s say you have the following object:

dic = {
"a": 1,
"b": 2,
"c": [{"d": [2, 3, 4], "e": [{"f": 1, "g": 2}]}]
}

which you want to flatten. Just apply flatten:

from flatten_json import flatten
flatten(dic)

Results:

{'a': '1',
'b': '2',
'c_0_d_0': '2',
'c_0_d_1': '3',
'c_0_d_2': '4',
'c_0_e_0_f': '1',
'c_0_e_0_g': '2'}

Usage with Pandas

For the following object:

dic = [
{"a": 1, "b": 2, "c": {"d": 3, "e": 4}},
{"a": 0.5, "c": {"d": 3.2}},
{"a": 0.8, "b": 1.8},
]

We can apply flatten to each element in the array and then use pandas to capture the output as a dataframe.

dic_flattened = [flatten(d) for d in dic]

which creates an array of flattened objects:

[{'a': '1', 'b': '2', 'c_d': '3', 'c_e': '4'},
{'a': '0.5', 'c_d': '3.2'},
{'a': '0.8', 'b': '1.8'}]

Finally you can use pd.DataFrame to capture the flattened array:

import pandas as pd
df = pd.DataFrame(dic_flattened)

The final result as a Pandas dataframe:

a   b   c_d c_e
0 1 2 3 4
1 0.5 NaN 3.2 NaN
2 0.8 1.8 NaN NaN

Custom separator

By default _ is used to separate nested element. You can change this by passing the desired character:

flatten({"a": [1]}, '|')

returns:

{'a|0': 1}

Thanks to @jvalhondo, @drajen, and @azaitsev for pointing this out.

Here’s flatten_json on GitHub and PyPi.

Show your support

Clapping shows how much you appreciated Amir Ziai’s story.