Building a nested hierarchy of JSON in a relational DB

How to best store and retrieve a tree-style in a relational DB? This pattern is more common than we think. Examples would be hierarchical tags and threaded comments or discussions.

Let’s start with the basic table structure with a self reference to a parent genre.

Here’s how our proposed hierarchy looks like:

followed by insert statements to populate the table,

Now, we’re all set. In order to facilitate finding of items in a tree hierarchy easier, we first build a materialized path of our items. This is a serialized column which stores all the ancestor ids of the item. We shall represent this using postgres array type. To construct a materialized path, we use a feature called CTEs(Common Table Expressions). It is a fancy term for the following type of expression:

WITH custom_table AS ( 
SELECT a,b,c FROM existing_table WHERE id = 1
) SELECT a FROM custom_table;

Here’s a sample CTE for fetching all the root genre items from the genres table.

WITH root_genres AS ( 
SELECT id, name FROM genres WHERE parent_id IS NULL
) SELECT * FROM root_genres;

CTEs can be recursive, in which case, they will be prefixed with WITH RECURSIVE. This implies that I can refer to the CTE table being constructed within the CTE itself, just like recursive functions in most programming languages. Let’s take a stab at constructing the materialized path table using recursive CTEs.

We can break it down into 2 parts. The first part fetches all root node items with an empty materialized path array. The second part fetches the children by appending the current parent to the path column. Note that || indicates array concatenation in PostgreSQL. For instance, ARRAY[4,5,6] || 7 gives {4,5,6,7}.

To get a subtree of a given item, all we have to add is an extra WHERE clause to check if the given id is the last item in the path.

This will fetch all the children of item whose id is 15.

We might want to do this for every item in genres. So, we convert the above CTE into a PostgreSQL function. Now, we can apply it to every row in genres. Also, a tabular form might not fit a hierarchical representation. So, inside the function, we convert the above output to JSON while maintaining the hierarchy.

Running this through the genre table, we get:

SELECT id, name, COALESCE(get_children(id), '[]') AS children FROM genres;

Now, for the final act, i.e., representing the whole data from the above query as a single nested JSON object. For this, we cross the realm of PosgreSQL and move to Javascript land, without leaving the console ;) This can be faciliated by enabling the plv8 extension. Note that this requires PostgreSQL version >= 9.1.

CREATE EXTENSION plv8;

Why Javascript? For one, you need not learn another language to operate with your data. Besides, I personally find it easy to manipulate JSON in Javascript. The syntax for defining a Javascript function looks like this:

Our Javascipt function iterates through every row from the genre children table and builds a JSON object by inserting the row item under the appropriate parent. I found a recursive Javascript function at stackoverflow which finds an object given by key in a deeply nested JSON blob. I’ve adopted it for our needs. getObject finds an object by id and returns it.

The buildTree function calls getObject for each and every row in the materialized table of genres. If the object is not found, then it is added at the root level. Let’s call this and build our tree, again, using CTEs.

This will get us the nested JSON tree. Here’s the prettified version of it.

The genres table can contain other genre related dynamic properties, like date_added, count etc. This could be represented by a separate column in the table of type json. That’s correct. You get the best of both NoSQL and SQL in PostgreSQL. This will be the subject of another post.


Originally published at www.lakshminp.com on January 14, 2016.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.