I wrote a blog post last year about flattening JSON objects. I find myself using it often while manipulating data and I’ve noticed that it’s getting a descent amount of hits. I decided to package it up and make it available on Python Package Index (PyPI) so it’s easier to install and use in different projects:
pip install flatten_json
Usage
Let’s say you have the following object:
dic = {
"a": 1,
"b": 2,
"c": [{"d": [2, 3, 4], "e": [{"f": 1, "g": 2}]}]
}
which you want to flatten. Just apply flatten:
from flatten_json import flatten
flatten(dic)
Results:
{'a': '1',
'b': '2',
'c_0_d_0': '2',
'c_0_d_1': '3',
'c_0_d_2': '4',
'c_0_e_0_f': '1',
'c_0_e_0_g': '2'}
Usage with Pandas
For the following object:
dic = [
{"a": 1, "b": 2, "c": {"d": 3, "e": 4}},
{"a": 0.5, "c": {"d": 3.2}},
{"a": 0.8, "b": 1.8},
]
We can apply flatten to each element in the array and then use pandas to capture the output as a dataframe.
dic_flattened = [flatten(d) for d in dic]
which creates an array of flattened objects:
[{'a': '1', 'b': '2', 'c_d': '3', 'c_e': '4'},
{'a': '0.5', 'c_d': '3.2'},
{'a': '0.8', 'b': '1.8'}]
Finally you can use pd.DataFrame to capture the flattened array:
import pandas as pd
df = pd.DataFrame(dic_flattened)
The final result as a Pandas dataframe:
a b c_d c_e
0 1 2 3 4
1 0.5 NaN 3.2 NaN
2 0.8 1.8 NaN NaN
Custom separator
By default _
is used to separate nested element. You can change this by passing the desired character:
flatten({"a": [1]}, '|')
returns:
{'a|0': 1}
Thanks to @jvalhondo, @drajen, and @azaitsev for pointing this out.