Getting Started with JSON and Postgres

Adam Gross
6 min readFeb 12, 2018

--

To SQL or to No SQL — that is the question, or at least it has been for many developers in recent years. The answer of course is that for many use cases it is not either or, it is how and when to use both. Fortunately, started with the introduction of the JSON type in Postgres 9.2 in 2012, and more powerful JSON storage and manipulation capabilities in every release since, its been getting easy to combine JSON-y No SQL with a traditional SQL database.

In theory, this brings a number of benefits; the ability to store and manage JSON without having to operate a new data store, added Postgres index and query capabilities to JSON management, and the transforming and joining JSON data with traditional SQL commands and capabilities. In practice, taking advantage of these benefits requires some pretty arcane SQL and can make even the most die hard Regex fans blush. Hopefully this simple example will provide some guidance and inspiration to get over the hump and start taking advantage of JSON in your Postgres projects.

JSON / Postgres Example: Picture Color Sort using Google Cloud Vision API

To show off how to use JSON in Postgres, it helps to have a meaty JSON source. So this example will use the Google Cloud Vision API (https://cloud.google.com/vision/), a handy Web service that takes and image (via payload or URL), and returns all kinds of interesting data, including content and color analysis.

As you will see from running the demo above, one of the functions the API performs is a dominant color analysis, which extracts the ten most representative colors from the image, and returns them as a collection of RGB values. Below you can see a snippet of the API results for an image (the full version of which is here.)

"dominantColors": {
"colors": [
{ "color": {
"red": 224,
"green": 197,
"blue": 136
}, "score": 0.22920254,
"pixelFraction": 0.032422893
},

When you have the dominant colors for a set of images, one of the things you can do is sort them based on similarity. Put differently, for a given image, you can choose a dominant color, and write a query that orders the images based on their similarity to that color. Perhaps only slightly useful in the real world, but really useful in illustrating some of the basics of working with JSON Postgres.

Schema Setup and Data

To create this in Postgres, we are going to need a table of images URLs, and the corresponding Google Vision API image analysis. For convenience, a sample set of data is provided in the repo (if you want some handy python code to populate that data yourself, see the repo as well.) The core table is very simple; an ID, a VARCHAR to store the URL, and the JSON-B field to store the image info returned by the Google API.

CREATE TABLE "images_sample_data" (
"id" int4 NOT NULL,
"url" varchar(255) COLLATE "default",
"google_properties" jsonb
)

The full schema and data dump is available here; once its loaded into Postgres you will be ready to get the real work of transforming the JSON into additional tables that will be used for the image comparison and sort.

Using Postgres JSON Functions

So now we have a simple table with the raw JSON results of the Google API call stored in the google_properties field. The next step, and the real trick, is to transform that JSON into a usable SQL format.

For each image there will be 10 dominate color records as RGB values, stored as an array of JSON elements in the shape shown above. To show a simple transformation, we will create a table that takes the first of those 10 entries for each image, and transforms it into a standard table.

SELECT  id, url,
google_properties->'responses' ->0->'imagePropertiesAnnotation'->'dominantColors'->'colors'->1->'color'->>'red' as red,
google_properties->'responses' ->0->'imagePropertiesAnnotation'->'dominantColors'->'colors'->1->'color'->>'green' as green,
google_properties->'responses' ->0->'imagePropertiesAnnotation'->'dominantColors'->'colors'->1->'color'->>'blue' as blue
from images_sample_data

This returns a table with columns for each of the elements we care about:

This is a pretty straightforward query, and is hopefully readable as is; the key is the ‘->’ operator, introduced to Postgres as part of JSON support, which allows for traversing the document path. Two special things to consider here: the ‘0’ is an array element selector — change that to ‘1’ and the query will return the second instance of the JSON array that contains the image colors. And if you look carefully you will notice the use of “->>” instead of “->” at the end — this is a casting operation, telling Postgres to return a string representation of the element instead of the JSON node itself.

This query returns one row for each image, but as the API returns 10 colors per image, its clearly incomplete. It would be possible to hard code the query for other nine array elements, but you’d be right to hope that there must be a better way, as that query would be messy and brittle. And good news — there is! But its going to get funky quickly.

Transforming JSON Arrays to Table and Rows

To transform all the JSON into a target Postgres representation, its helpful to decompose the process into multiple steps, which in can be chained together using CTEs (Common Table Expressions). With CTEs, each step, or query, in process can emit a table, which is then transformed again (and again). While not always the most efficient method, they are also helpful for maintaining readability (and the author’s sanity).

For this transformation, the CTE query will have three steps. First, the JSON will be normalized so that each “dominantColor” entry to converted into its own row via the jsonb_array_elements function. Crucially, those rows will all share a common ID, so they can be reference back to the source image.

with q_a as (select 
jsonb_array_elements(google_properties->'responses'->0->'imagePropertiesAnnotation'->'dominantColors'->'colors') as jarray, id
from image_data

Next, each of these rows will be given its own number (from 1 to 10), so that each color entry (for each image) is both ordered and uniquely addressable. In addition, this step pulls the ‘score’ and ‘pixelfraction’ metadata out from the JSON and into their own columns.

select color, score, "pixelFraction", q_a.id, ROW_NUMBER () OVER (PARTITION BY q_a.id ORDER BY "pixelFraction" desc) from q_a, jsonb_to_record(q_a.jarray) as x(color jsonb, score float, "pixelFraction" text)

Finally, the color fields in the JSON are cast into their respective columns using jsonb_to_record.

select * from q_b, jsonb_to_record(q_b.color) as x(red text, blue int, green int)

And when you run it all together, you get the Postgres representation of the JSON you are looking for:

The full query is available in the repo along with the rest of the sample. One an imagine an easy modification wrapping the statement in CREATE TABLE to create a permanent cache of the results.

Querying for Color Similarity

Now that all the data is in one place, its ready to do something useful, and in this case to sort all of the images based on their similarity to a given color, or to be more precise, the similarity of one of their dominant colors as determined by the Vision API to the given color.

To sort the images, the query will use a simple pythagorean distance calculation. The full query is available here, and fortunately doesn’t any JSON magic as at this stage everything is in standard SQL types.

Hopefully this example can help you get over the hump and start using JSON in your Postgres apps — as you can see from the above examples, mastering a few esoteric functions can get you far down the path of bridging the worlds of JSON and SQL.

--

--