Fancy uses of JSON on PostgreSQL
It’s been a while since PostgreSQL implemented support for JSON columns in its tables. Basic at the beginning, now it offers a lot of functionality that can come in handy when you have a PostgreSQL database, and make use of some of its cool stuff (like the ton of functions of PostGIS), but also need to store some non-structured data or with a user-defined schema. A while ago, this was our case.
At Worldsensing we have used JSON on PostgreSQL a lot in some of our micro-services. In this post I will explain some of the more interesting things that can be done with JSON fields on PostgreSQL, finishing with a real case which has some complexity even after reading the documentation or some tutorials.
Support on SQLAlchemy
First of all, you can use JSON on top of an ORM like SQLAlchemy. Even though at many points we resorted to SQL queries to extract the full potential of PostgreSQL, you can do the basic things like inserting and retrieving data without that.
For example, creating a model with a JSON field in Python using SQLAlchemy is as simple as this:
from sqlalchemy import SQLAlchemy
from sqlalchemy.dialects import postgresqldb = SQLAlchemy()class Data(db.Model):
__tablename__ = "my_data"
id = db.Column(db.String, primary_key=True)
my_json_field = db.Column(postgresql.JSONB)
We use the JSONB type instead of the plain JSON type because it allows way more functionalities that are not possible with the other, older type. However, during the rest of this post I’ll refer to our data type as JSON.
Now we can query an object normally, and access the JSON field as a dictionary:
data = Data.query.filter_by(id="my_id").first()
my_dict = data.my_json_field
Indexing by JSON columns
Now that we have a JSON field, we can create an index on it so that we can perform complex queries using the data stored there.
CREATE INDEX ON my_data using gin(my_json_field jsonb_path_ops)
Now this index can be used for our queries, allowing to find rows by the keys in the JSON object. The documentation offers more information about the different ways to create indexes over JSON, like indexing over a specific key of the object. However, we found that this type of index jsonb_path_ops offers support for all the types of queries that we needed.
Querying JSON objects
Find by values of a key
You can check for the presence of a key-value pair in a JSON field by using the @>
operator. Then you can use ->
to extract a field of the object. By doubling the >
symbol (i.e. writing ->>
) you obtain the value as text. This will only give you one level of depth, so if you want to go deeper, you must use the operator #>
which takes a list of parameters to obtain the element at the lower level.
SELECT id, my_json_field #>> '{field_depth_1, field_depth_2}'
FROM my_data
WHERE my_json_field @> '{"other_field": 3}'::jsonb
This will return the id and the value for the path field_depth_1.field_depth_2 for rows where my_json_field contains the key other_field with a value equal to 3.
You can also use the operator #>>
with an empty list of parameters ('{}'
) to just obtain the full JSON object as text.
Convert to table
If you need to look further into the contents of a JSON object, you can use jsonb_each
which converts the field into a table with the columns key and value, and a row for each key-value pair. You can do the same for a JSONarray with the funcion jsonb_array_elements
, which creates a table with a single column value containing the elements of the list. The value column is of type JSON, but by adding _text
at the end of the function you can change its type to text.
Creating new JSON objects from results
We have seen how to use the information inside a JSON field to query our data, but obviously we will also want to modify it. PostgreSQL provides several ways to update a JSON object or create a new one from a set of data. Let’s see the ones that have been more useful for us.
Concatenation
In one of our use cases for JSON fields, we needed to support partial updates of objects, i.e. accept that the user may insert only a subset of all the keys in the object. Therefore, anytime the JSON was updated, new keys could be introduced (since no full initialization was required) and existing keys could be updated. Well, luckily that’s very easy to do in PostgrerSQL:
UPDATE my_data SET my_json_field = my_json_field::jsonb || '{"new_field": 1, "existing_field": 2}'::jsonb;
If our old value was {"existing_field": 1, "other_field": 2}
, after the update we will have {"existing_field": 2, "other_field": 2, "new_field": 1}
.
Build an object
If you have several fields that you want to output as JSON, you can do it with the function json_build_object
. You just have to pass the keys and values and you will receive a JSON object.
SELECT json_build_object('id', id, 'json_data', my_json_field::jsonb, 'type', 'my_type')::jsonb FROM my_data;
This will return you an object like this:
{ "id": "element1",
"json_data": {"a_field_i_had": 3},
"type": "my_type" }
You can insert this object into a JSONB field or just retrieve it with a query.
Build and array
Similarly, you can create a JSON array with the operation json_agg:
SELECT json_agg(id, my_json_field) FROM my_data;
This will return an array like this:
["element1", {"a_field_i_had": 3}]
Complex JSON processing
We’ve seen a few of the things we can do, and you can find many more in the documentation. So far, it’s not that complicated. However, you can combine these functions to perform something more complex.
In this example, we have a table with three fields:
- id: just a numerical unique id.
- custom_fields: a set of custom fields that are used by the application.
- refs: a set of references to other elements by id.
We want to obtain the element with a specific id, as a JSON object, and all the elements that it references (values in its refs field) in an array that will be presented as a field in the object. Let’s use the things we have presented to create the query piece by piece. First we can obtain the ids of the elements referenced by the one we are looking for with this query:
SELECT refs.value::int
FROM my_data m,
jsonb_each_text(m.refs) refs
WHERE m.id = 1;
Now we can get the elements that have these ids, in a JSON array, by building an object for each row with json_build_object
and aggregating them all in a list with json_agg
:
SELECT json_agg(json_build_object(
'id', j.id,
'custom_fields', j.custom_fields::jsonb
)::jsonb)
FROM my_data as j
WHERE j.id=ANY(
... // Previous query
);
Finally, we can create a JSON object that contains all the fields in the row with our desired id and the references as another field:
SELECT json_build_object(
'id', m.id,
'custom_fields', m.custom_fields::jsonb,
'refs',
coalesce((
... // Previous query
),
'[]'::json)
)::jsonb as json
FROM my_data m where m.id = 1;
The full query looks like this:
SELECT json_build_object(
'id', m.id,
'custom_fields', custom_fields::jsonb,
'refs',
coalesce((
SELECT json_agg(
json_build_object(
'id', j.id,
'custom_fields', j.custom_fields::jsonb
)::jsonb)
FROM my_data as j
WHERE j.id=ANY(
SELECT refs.value::int
FROM my_data m,
jsonb_each_text(m.refs) refs
WHERE m.id = 1
)),
'[]'::json))::jsonb as json
FROM my_data m where m.id = 1;
So this returns a JSON object just like we wanted:
postgres=# select * from my_data;
custom_fields | id | refs
— — — — — — — — — — — + — — + — — — — — — — — — — — —
{“a_field”: 2} | 1 | {“ref1”: 2, “ref2”: 3}
{“another_field”: 5} | 2 | {“ref1”: 1, “ref2”: 3}
{“a_field”: 1} | 3 | {}
(3 rows)postgres=# SELECT jsonb_pretty(json_build_object( ... ;
json
— — — — — — — — — — — — — — — — — —
{ +
"id": 1, +
"refs": [ +
{ +
"id": 2, +
"custom_fields": { +
"another_field": 5 +
} +
}, +
{ +
"id": 3, +
"custom_fields": { +
"a_field": 1 +
} +
} +
], +
"custom_fields": { +
"a_field": 2 +
} +
}
(1 row)postgres=#
This is a real case of what can be done with PostgreSQL. It has been very useful for us even though some of the queries can get a bit complex, so I hope this post has been useful for you to extract all the power from PostgreSQL and JSON.