Python: SQL to JSON and beyond!

Getting your data out of your database and into JSON for the purpose of a RESTful API is becoming more and more at the center of even the most casual backend development. There are already a lot of existing tools to aid in doing this. Just in the Python language alone we have the Django REST Framework, Flask-RESTful, the now depricated simplejson and it’s new replacement, the json builtin function.

Django REST will rip your rows and columns out of your database and return some very neatly formatted JSON. Using Flask-RESTful with SQLAlchemy achieves a very similar effect.

However, what if you want to write some raw queries? What if you need some speed there? Or what if you are just bored and want to see how much you can do with as little as possible.

Let me present to you a case study centering around a raw SQL query and how the same technique could be applied to any situation where you have rows and columns ( spreadsheets, csv files, ect… you know the drill )

The first solution is pulling the data from our query, and just hard coding the key names ( see line 11 ) . While hard coding data like this isn’t an uncommon practice, it really should be. It is not dynamic, it is not dry and you will end up having to change things in multiple places or face breakage.

So this was the second solution

Here I’m drawing the key names from cursor.description, this ensures that my key names would be updated if column names changed. Then I’m pushing it into a dictionary with a dictionary comprehension. This would be better if readability wasn’t a factor, but I think most developers would agree readability is going to come first.

Using list comprehensions, zip and dict, we can get this mess of loops and dict comprhensions into a one readable line.

Let’s break down what’s actually going on here. we are using two list comprehensions, zip and dict.

The first list comprehension

[key[0] for key in cursor.description]

Helps us sort out garbage data that cursor.description can return ( hence the [0] ), and build a list out of the column names.

The second one, which is essentially

[ … for row in result ]

Cycles though our rows in the query itself.

Combining these two without zip or dict would give us something like this

[([key for key in cursor.description], row) for row in result]

Which is fairly readable now that we know how both halfs act. However, this would just return us a list of disassociated list with column names and tuples with values from the rows in the result. We need to push that data together. This is where zip and dict will come in.

The first thing we want to do is zip the column and rows, i.e. combine them into one thing.

[(zip([key for key in columns], row)) for row in table]

However, this just gives us a list of zip objects. Not quite JSON serializable. To get them JSON serializable the final step is to add dict.

[dict(zip([key[0] for key in cursor.description], row)) for row in result]

Which we can pump right out with json.dumps or any other serializer we want.

Application To Other Areas:

So most of us would agree this is a pretty cool little design pattern for pushing rows and columns into a dictionary, but what if we wanted to use it with a CSV file or Spreadsheet?

And here it is… This looks a bit reduced and that is mostly because we are not dealing with squlite. Instead of having to convert the cursor.description object into a list, we just are just assuming our columns will come in a list.

The second part of the list comprehension remains mostly unchanged, as we are still dealing with rows.

This formula should work for just about anything where you want to assign a fixed set of key names to a lot of data. We don’t necessarily need to be dumping it out to JSON or even storing the dictionaries in a list.

Let me know if you found this helpful, found any bugs in the code or have any suggestions!