NotSoCoolCoder
2 min readOct 2, 2019

Handling JSON data for DynamoDB using Python

JSON is a very common data format. You may come across plenty of scenarios where you have JSON data as input and you need to push that in database. Let’s say you are using DynamoDB as your database and you wish to store below data in DynamoDB database

{
"id":1,
"personal_detail":{"name":"bean", "siblings":["Alice", "Bob"]},
"education":["BTECH", "MBA"]
}

This is just random personal data (Don’t ask me what kind of application collect this kind of data , I have no clue !!!)

Wait, there is one more requirement. Once the data would be added to database, I would be able to query the data at ‘sibling’ level.

Here is how this will be stored to database.

Look at the data stored in database. It’s JSON, isn’t it? However, there is a difference between what we have as input and what is there in database now. The different is not at ‘data’ level, but DynamoDB needs to know DataType associated with data. Look at below

We have two ways to add these json data to DynamoDB in required format. One is a bit lengthy way using low level client as part for boto3. With this, you need to explicitly specify datatype with values. To do that, you need to parse the entire json message, iterate through elements and values.

The other easy way is to use resource which like high level database client. Here is the code to put the said data to database.

import boto3
import json
def put_item_in_database(jsondata):
#API expect data in dictionary format
datadict = json.loads(jsondata)
database = boto3.resource('dynamodb')
table = database.Table(<tablename as string>)
table.put_item(Item = datadict)

Simple. isn’t it?

Now, you might have wondered , why we don’t just store as string if the data coming as json string? There is a reason having ‘list’, ‘map’ and other datatype and using them while storing the data.

If I would have stored the entire data as just on string with id =1, I would have hard time querying persons with some custom criteria like

Get me siblings for the item has id =1

I had to get the entire item, parse the json item and find the availability of sibling. This can be accomplished by below code with reference to earlier code we wrote.

table.get_item(Key = {'id':1}, ProjectionExpression = "personal_detail.siblings")

High level interface bypass the need to using datatype explicitly. This may reduce effort at developer end and make code data centric.