Python json read helper 2

Hu Zhang
Hu Zhang
Sep 5, 2018 · 4 min read

I have a requirement that I need to map any json object into a defined json schema in Python.

The first challenge I met is that I cannot read json value when the key is variable in Python such as get value from json_dict[‘key’][‘sub-key’][0][‘item1’]. I googled many solutions which read json in Python is using get() method or [] operator with the property key. But I need the key to be a variable, like defining a property_path = “[‘key1’][‘’key2]”, then I can use json_dict[property_path] or something like that to get the json value. See my first article to see how to solve this challenge: Python json read helper

Secondly, I need to map different json objects to a defined json schema for later pipeline using. The defined schema looks like following, call it PERSON (just use it for example, the real schema is very complex):

{
"names": [
{
"raw": "",
"displayName": ""
}
],
"emails": [{
"raw": "",
"email": ""
}],
"locations": [{
"raw": "",
"city": "",
"province": "",
"country": ""
}],
"gender": ""
}

Say, I have a object like this (object_data)and want to map it to the defined schema:

{
"data": {
"statusCode": 200,
"timestamp": "2018-08-23T21:21:15.453Z",
"body": {
"userProfile": {
"displayName": "Tiger Zhang",
"homeTown": null,
"location": {
"address": "XXX St XXX Ave",
"city": "Edmonton",
"province": "AB"
},
"email": "tigzhanghit@gmail.com",
"gender": "male",
"profilePhoto": "https://xxxxx"
},
"educationHistory": null,
"workHistory": null,
"friends": [
{
"name": "friend 1",
"photo": "url1"
},
{
"name": "friend 2",
"photo": "url2"
}
]
}
}
}

Since we collect data from different sources and their schema are different, we need a general transformer to finish the mapping.

The idea is that we define a mapping rule based on the json object and target schema (person_schema_format). Then writing a utility to map the json objects into person_schema_format schema.

The mapping rule of this objects (object_map_rule) is like this

{
"names": {
"raw": "data.body.userProfile.displayName",
"displayName": "data.body.userProfile.displayName"
},
"emails": {
"raw": "data.body.userProfile.email",
"email": "data.body.userProfile.email"
},
"gender":{
"raw": "data.body.userProfile.gender"
},
"locations": {
"raw": "data.body.userProfile.location.address",
"city": "data.body.userProfile.location.city",
"province": "data.body.userProfile.location.province"
}
}

Then with the json read helper (DictQueryHelper), we can write the utility to finish the mapping (need to handle the object/array cases). The code is looks like:

""" 
General method which maps any objects data to PERSON,
based on the map rule which defined according to the object data structure
"""
def map_social_data_to_identity(person_schema_format, object_map_rule, object_data):
for property in person_schema_format:
if property in object_map_rule:
item = {}
for key in object_map_rule[property]:
# get the `value` from map-rule which is the key of extracting the value from data
key_path = object_map_rule[property][key]
if DictQueryHelper(object_data).get(key_path) != None and DictQueryHelper(object_data).get(key_path):
item[key] = DictQueryHelper(object_data).get(key_path)

if item:
# array case such as `names`
if isinstance(person_schema_format[property], list):
person_schema_format[property].append(item)
# object case such as `gender`
else:
person_schema_format[property] = item

return person_schema_format

After running this function, we get the PERSON object which is what we need.

{
"names": [
{
"raw": "Tiger Zhang",
"displayName": "Tiger Zhang"
}
],
"emails": [{
"raw": "tigzhanghit@gmail.com",
"email": "tigzhanghit@gmail.com"
}],
"locations": [{
"raw": "XXX St XXX Ave",
"city": "Edmonton",
"province": "AB",
"country": ""
}],
"gender": "male"
}

In above code example, It doesn’t show how I mapping arrays (such as `friends` in the example data is an array).

If you add friends mapping rule in the object_map_rule (friends is an array in the person_schema_format), say:

# friends schema in person_schema_format
{
"friends": {
"name": "",
"image": ""
}
}
# friends mapping rule in object_map_rule
{
"friends": {
"name": "data.body.friends.name",
"image": "data.body.friends.photo"
}
}

When you run above utility code, you will get the result likes:

{
"friends": [
{
"name": [{
"friend 1",
"friend 2"
}],
"image": [{
"url1",
"url2"
}]
}
]
}

The reason is that in our json read helper method (DictQueryHelper), for list in the json object, we traverse it and collect all the values. Nothing is wrong in the helper. But the result is not what we expected. We are looking for something like this:

{
"friends": [
{
"name": "friend 1",
"photo": "url1"
},
{
"name": "friend 2",
"photo": "url2"
}
]
}

In this case we just need to flat mapping the friends output into the friends schema. The code is like:

def flat_map_item(data):
items = []
size = [len(data[property]) for property in data]
max_size = max(size)
for i in range(max_size):
item = {}
for property in data:
item[property] = data[property][i]
items.append(item)
return items

Finally, we just finished a simple json object mapping utility in Python which can map “any” json objects to a specific schema. The code which has processing arrays is in the following, I’ll upload the full example into my github later. Please feel free to commit below and let me know if you have any questions.

Thanks !

def map_social_data_to_identity(person_schema_format, object_map_rule, object_data):for property in person_schema_format:
if property in object_map_rule:
item = {}
list_flag = False
for key in object_map_rule[property]:
key_path = object_map_rule[property][key]
if(DictQueryHelper(object_data).get(key_path) != None and DictQueryHelper(object_data).get(key_path)):
if isinstance(DictQueryHelper(object_data).get(key_path), list):
list_flag = True
item[key] = DictQueryHelper(object_data).get(key_path)

if item:
# array case such as `names`
if isinstance(person_schema_format[property], list):
if not list_flag:
person_schema_format[property].append(item)
else :
items = flat_map_item(item)
person_schema_format[property].extend(items)
# object case such as `gender`
else:
person_schema_format[property] = item

return person_schema_format

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade