Photo by <a href=”https://unsplash.com/@kristinwilson?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Kristin Wilson</a> on <a href=”https://unsplash.com/t/business-work?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>

Introduction to JSON and JSON Normalization with PostgreSQL

Aaron Schlegel
5 min readJul 27, 2021

If you’ve been working with data for any period of time, you’ve likely run into the JSON data format. JSON, short for JavaScript Object Notation, is a widely popular and standard format of data. It is built on two universal data structures that are supported by nearly all modern programming languages. According to the json.org website:

JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

JSON has a basic structure that can be nested any number of times to create more complex structures suited to the data being written. In this post, we will introduce the basics of the JSON format before exploring the JSON support functions added to PostgreSQL in version 9.2. The JSON support functions allow easier interaction with JSON through PostgreSQL. They can often alleviate the need to transform or reshape the data with a library such as pandas before inserting it into a SQL data table.

Basics of JSON (JavaScript Object Notation)

JSON is an object that consists of an unordered set of key/value pairs that begin and end with a curly brace. For example, the following key/value pairs describing my cat, Baxter, is valid JSON:


{
“name”: “Baxter”,
“species”: “Cat”,
“color”: “Black”,
“gender”: “Male”,
“age”: 11,
“neutered”: true,
“declawed”: false
}

In the above example, the “keys” are "name", “species", ”color", “gender", “age", “neutered", “declawed", and the “values” are "Baxter", “Catt", “Black", “Male", 11, true, false. Values can be double-quoted strings, a number, true, false or null, an object, or an array. As mentioned earlier, JSON objects can be nested any number of times. For example, we can change the “color” key/value pair to a nested JSON object to specify all of Baxter’s colors.


{
“name”: “Baxter”,
“species”: “Cat”,
“colors”: {
“color1”: “Black”,
“color2”: “White”
},
“gender”: “Male”,
“age”: 11,
“neutered”: true,
“declawed”: false
}

JavaScript Object Notation also allows for arrays, which are an ordered collection of values.


{
“name”: “Baxter”,
“species”: “Cat”,
“colors”: {
“color1”: “Black”,
“color2”: “White”
},
“gender”: “Male”,
“age”: 11,
“neutered”: true,
“declawed”: false,
“tags”: [
“cute”,
“cuddly”,
“playful”
]
}

As we can see, JSON is a data format that can be adapted to represent any kind of real-world data through nesting and its use of key/value pairs. JSON is also tiny in size and has hence become the standard data format for web data. Despite the advantages the JSON data format offers, sometimes ‘normalization’ of the data into a tabular form such as comma-separated values (CSV) is needed. This format is more familiar with analytics and BI tools that are commonly used in many organizations. Fortunately, the developers of PostgreSQL recognized a need for the ability to work with and normalize JSON data more efficiently and in the 9.2 version release provided a large number of JSON support functions for creating and manipulating JSON data. In the next section, we will explore several of the more commonly used JSON support functions available in PostgreSQL for transforming JSON data that may be received from an API or a web form into a ‘normalized’ format.

‘Normalizing’ JSON Data with PostgreSQL Functions

PostgreSQL, starting with version 9.2, offers several different support functions for creating and manipulating JSON objects. For example, consider the following JSON that we created earlier in the post:


{
“name”: “Baxter”,
“species”: “Cat”,
“color”: “Black”,
“gender”: “Male”,
“age”: 11,
“neutered”: true,
“declawed” false
}

Let’s say we wanted to normalize this JSON as a table of key/value pairs. The json_each function in PostgreSQL expands the outermost JSON object into a set of key/value pairs.


SELECT * FROM
json_each(‘{“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false}’);
```
key, value
name,”Baxter”
species,”Cat”
color,”Black”
gender,”Male”
age,11
neutered,true
declawed,false

However, we see the string values are parsed with double quotes, which isn’t the optimal approach for storing strings in database tables. The PostgreSQL function json_each_texttransforms all of the values in a text data type.


SELECT * FROM
json_each_text(‘{“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false}’);
key,value
name,Baxter
species,Cat
color,Black
gender,Male
age,11
neutered,true
declawed,false

The downside to this approach is the age value 11and the Trueand False values are also interpreted as a text data type. The json_object_keys function returns the set of keys in the outermost object of the JSON object.


SELECT * FROM
json_object_keys(‘{“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false}’);
json_object_keys
name
species
color
gender
age
neutered
declawed

Several JSON support functions in PostgreSQL allow us to expand a JSON object to a row whose columns match the record defined by a data table. For example, let’s say we have a pet table in a PostgreSQL database.

CREATE TABLE pet (
name TEXT,
species TEXT,
color TEXT,
gender TEXT,
age INT,
neutered TEXT,
declawed TEXT
);

By taking advantage of the json_populate_record function, we can expand the JSON object to match the table’s columns. Any keys not matching the columns of the base table will be ignored.


SELECT * FROM
json_populate_record(null::pet, ‘{“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false, “test”: 1}’);
name,species,color,gender,age,neutered,declawed
Baxter,Cat,Black,Male,11,true,false

The json_populate_record, however, only accepts a JSON object and not an array. Fortunately, the json_populate_recordset allows us to pass a collection of JSON objects which will then convert each object into a table row similar to the json_populate_record.


SELECT * FROM json_populate_recordset(null::pet, ‘[{“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false, “test”: 1}, {“name”: “Baxter”, “species”: “Cat”, “color”: “Black”, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false, “test”: 1}]’);

name,species,color,gender,age,neutered,declawed
Baxter,Cat,Black,Male,11,true,false
Baxter,Cat,Black,Male,11,true,false

As a final example, let’s consider the following JSON that we investigated earlier:


{
“name”: “Baxter”,
“species”: “Cat”,
“colors”: {
“color1”: “Black”,
“color2”: “White”
},
“gender”: “Male”,
“age”: 11,
“neutered”: true,
“declawed”: false
}

The json_extract_path and json_extract_path_text functions allows us to extract a value from a path of JSON keys. For example, let’s say we wanted to extract the value of the “color1” key in the nested “colors” object.


SELECT * FROM json_extract_path(‘{“name”: “Baxter”, “species”: “Cat”, “colors”: {“color1”: “Black”, “color2”: “White”}, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false}’, ‘colors’, ‘color1’);
json_extract_path
“Black"
SELECT * FROM json_extract_path_text(‘{“name”: “Baxter”, “species”: “Cat”, “colors”: {“color1”: “Black”, “color2”: “White”}, “gender”: “Male”, “age”: 11, “neutered”: true, “declawed”: false}’, ‘colors’, ‘color1’);json_extract_path
Black

Conclusion

In this post, we introduced many of the support functions PostgreSQL provides for making it easier to work with JSON objects and arrays. The json_populate_record and json_populate_recordset functions are particularly useful for manipulating JSON data into a form that can be inserted into a data table as they automatically ignore keys that are not a column in the base table.

Although PostgreSQL gives us multiple tools for working with JSON, sometimes it is easier to insert the JSON object directly into a table column with a json or jsonb data type and query the data now. This approach can be more efficient when the keys and values of the JSON change or are not consistent, which could potentially cause issues with the method of manipulating the JSON into a form to insert into a downstream table. In the following post, we will walk through how to store JSON data directly in a PostgreSQL data table and query the JSON directly.

--

--

Aaron Schlegel

Seeker of knowledge, truth pilgrim, builder of things, science fiction lover, math and stats aficionado, data scientist. https://aaronschlegel.me