How to store JSON documents in PostgreSQL

Satya Thirumani
11 min readApr 25, 2023

--

Can we store documents in Postgres? Is PostgreSQL can be used as a NoSQL Document database? Can we store semi-structured data in (Aurora) Postgres database clusters?

The answer is YES.

PostgreSQL offers two datatypes for storing JSON documents data: json and jsonb.

JSON JSONB in PostgreSQL

For more than a decade, JSON/JSONB have been primitive data types in PostgreSQL database. JSON object in a column of JSON datatypes, and by using this one can use PostgreSQL as a schema-less NoSQL database.

JSON (JavaScript Object Notation) type stores exact input copy, faster to ingest (writes are quick), which requires to reparse on each execution (reads might be slower).

  • JSON preserves the original formatting/whitespaces and ordering of the keys. JSON preserves duplicate keys.

JSONB (JSON Binary) stores in decomposed binary format which makes it slightly slower during ingestion, but provides significantly faster retrieval time as no parsing is needed (reads are fast).

JSONB takes more space, compared to JSON, operations are significantly faster.

jsonb does not preserve white space between tokens, as well as the order of keys, and does not keep duplicate object keys.

jsonb supports advanced indexing and easier updates on its columns.

-- Basic DDL/DML commands on JSONB and JSON columns in Postgres
create table json_data(id int, doc json);
create table test1_jsonb(Id bigint, val jsonb);
CREATE TABLE emp(emp_id int NOT NULL, data jsonb);

ALTER TABLE employees ADD COLUMN new_json_column JSONB DEFAULT NULL;

INSERT INTO jsonb_test VALUES('{"a": 1, "b": 2, "c": 3}');
INSERT INTO emp VALUES (1, '{"name": "John", "hobbies": ["Movies", "Football", "Hiking"]}');
insert into test1_jsonb values(100, '{"val1": "Json Vs JsonB test", "val2": 1, "val3": "2021–08–31", "val4": ["test1", "test2", "test3"]}');

JSON/JSONB operators

  • -> to check the existence of key or by array index, will return json/jsonb, similar to json_extract_path function
  • ->> to check the existence of key or by array index, will return text, similar to json_extract_path_text function
  • #> to check the existence of json path by keys or by array index, will return json/jsonb, similar to json_extract_path function
  • #>> to check the existence of json path by keys or by array index, will return text, similar to json_extract_path_text function

JSONB operators

  • ? existence operator, to check the existence of key or array element, similar to LIKE
  • ?| existence operator, to check the existence of any value or array elements, similar to IN clause
  • ?& existence operator, to check the existence of all values or array elements, similar to AND clause
  • @> containment operator, path operator, to check key-value present or not, will return true/false, left jsonb contains right jsonb, similar to WHERE
  • <@ containment operator, path operator, to check key-value present or not, will return true/false, right jsonb contains left jsonb, similar to WHERE
  • @? match operator, jsonpath operator, does JSON path return any item for the specified JSON value
  • @@ match operator, jsonpath operator, returns the result of a JSON path predicate check for the specified JSON value
  • || concatenation operator, concatenates two jsonb values
  • - deletes a key/value from a JSON object, or a matching string value(s) from a JSON array, delete the array element with a specified index
  • #- deletes the field or array element at the specified path, where path elements can be either field keys or array indexes
  • <, <=, >, >=, =, <>, != comparison operators

One of the defects of relational (RDBMS) databases is that it is not flexible for column addition/deletion. But using JSON datatypes, it can be flexible with column addition/deletions.

Postgres does not store column statistics for JSON/JSONB columns. No statistics for json and jsonb columns.

-- select commands/functions on JSON and JSONB table columns in PostgreSQL

SELECT '66'::json;
SELECT '[1, 2, "foo", null]'::json;
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::jsonb;
SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb;
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;

select '[11,55,99]'::json -> 1; -- returns 55
SELECT '{"list":[2,4,6,8]}'::json -> 'list' -> 2; -- returns 6
SELECT '{"list":[2,4,6,8]}'::json #> ARRAY['list','2']; -- returns 6
select '[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json -> -3;
select '{"a": {"b":"foo"}}'::json -> 'a';
select '{"a": {"b": ["foo","bar"]}}'::json #>> '{a,b,1}';
SELECT ('{"a": 1}'::jsonb)['a'];
SELECT ('{"a": {"b": {"c": 111}}}'::jsonb)['a']['b']['c'];
SELECT ('[33, "8", null]'::jsonb)[1];
select '{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb;
select '["a", "b", "c"]'::jsonb ? 'k';
select '{"a": "b"}'::jsonb || '{"c": "d"}'::jsonb;
select '[1, 2]'::jsonb || '3'::jsonb;
select '["a", "b", "c", "b"]'::jsonb - 'b';
select '["a", "b","z"]'::jsonb - 1;
select '["a", {"b":1}]'::jsonb #- '{1,b}';
select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)';
select '{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2';

select * from emp;
select id, details::json from planets;
select data from emp;
select data -> 'name' from emp;
SELECT data -> 'key1' -> 'key2' FROM table;
SELECT data -> 'key1', data -> 'key2' FROM table;
select doc -> 'address' ->> 'pin' from jsondata;
select doc #>> {'address,pin'} from jsondata;
select id, details -> 'tags' -> 0, details -> 'tags' ->> 0 from customers; -- index number, starts with 0
select data #> '{characteristics,2}' from users;
select (data #- '{characteristics,2}') -> 'characteristics' from users;
select id, details -> 'name', details -> 'contacts' ?| array['fax','mobile'] from customers;
select id, details -> 'name', details -> 'contacts' ?& array['email','mobile'] from customers;

SELECT * FROM table_name WHERE data['key'] = '"value"';
SELECT * FROM emp WHERE data -> 'name' = '"Marc"';
select id, details from customers where details -> 'tags' ? 'techie';
select id, data from cdata where data -> 'hobbies' ?& array['Googling','Movies'];
SELECT * FROM test_table WHERE data @> '{"key": "123"}'::jsonb;
SELECT * FROM test_table WHERE data @> jsonb_build_object('key', '123');
SELECT doc->'site_name' FROM websites WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]';
select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
select * from test1_jsonb where val ->> 'val3' = '2021–08–10';
select id,data -> 'City' from cdata where data -> 'City' is not null;
select data #>> '{enabled,requiredState}' from cdata where id=5;
select data #>> '{featured,included}' from cdata where (data #>> '{enabled,skus,0,region}')::jsonb @> '"CANDADA"'::jsonb;
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @? '$.tags[*] ? (@ == "qui")';
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"';
select * from books where data @@ '$.publisher == "kjKZ1tvq"';

select data #>> '{enabled,offered,sku}' from cdata where data @? '$.feature.associations.action ? (@ == "full")';
select data #>> '{enabled,offered,sku,0}' from cdata where data @? '$.metadata.**.action ? (@ == "select")';
select data #>> '{featured,included,0,hierarchy}' from cdata where data @? '$.metadata.**.skus.** ? (@ == "UK")';
select data from doc where $.track.segments[*] ? (@.HR > 130)."start time";
select data from doc where $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size();
select id, (details ->> 'orbit period')::real as orbit from planets where (details ->> 'orbit period')::real > 2;
select details -> 'address' ->> 'city', count('city') from customers group by details -> 'address' ->> 'city';
select details -> 'address' ->> 'code', to_char(sum(cast(details ->> 'spent' as real)),'9,999.99') as total from customers group by details -> 'address' ->> 'code';

select to_jsonb(row(42, 'Satya DBA said "Hi."'::text));
SELECT to_jsonb(employees) FROM employees LIMIT 5;

select row_to_json(row(1,'foo'));
select row_to_json(row(i,j,k,l,m,n)) from a;
select row_to_json(t) from (select id, name from customers) as t;
select array_to_json('{{1,5},{99,100}}'::int[]);
select array_to_json(array_agg(row_to_json(t)), true) from (select id, name from customers) as t;

select json_build_array(1, 2, 'foo', 4.8, 5667);
select '[1, 2]'::jsonb || jsonb_build_array('[3, 4]'::jsonb);
select jsonb_build_object('foo', 1, 2, row(3,'bar'));
select json_object('{a, 1, b, "def", c, 3.5}');
select jsonb_object('{a,b}', '{1,2}');
select json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}');
select jsonb_object_keys(details -> 'contacts') from customers;
select json_agg(name) from emp;
select json_object_agg(name) from emp;
select json_object_agg(name, qty) from (select name, sum(qty) as qty from products group by 1) r;
select jsonb_object_agg(name) from emp;
select jsonb_pretty('[{"f1":1,"f2":null}, 2]');
select jsonb_pretty(data) from cdata where id = 5;
select json_typeof('-123.4');
select json_typeof(description -> 'laptop') from product;
select jsonb_typeof(jsonb_extract_path(data,'enabled','offered','addOns')) from cdata where id=6;

select json_extract_path('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}', 'f4', 'f6');
select json_extract_path('{"a": [11, 55, 99]}', 'a', '1');
select id, json_extract_path(details::json,'address','city'), json_extract_path_text(details::json,'address','city') select json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}', 'f4', 'f6');
from customers;
select product from orders where jsonb_path_exists(product,'$.wafflecone');
select jsonb_path_exists('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2, "max":4}');
select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)');
select jsonb_path_match('{"a":[1,2,3,4,5]}', 'exists($.a[*] ? (@ >= $min && @ <= $max))', '{"min":2, "max":4}');
select product from orders where jsonb_path_match(product, '$.*.scoops == "2"');

select jsonb_path_query('[2]', '$[0] + 3');
select jsonb_path_query('[4]', '2 * $[0]');
select jsonb_path_query(custinfo,'$.name') from orders limit 2;
select jsonb_path_query('{"m": [11, 15]}', '$.m.size()');
select jsonb_path_query('{"h": 1.7}', '$.h.floor()');
select id, jsonb_path_query(product,'$.**{2}') from orders;
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2, "max":4}');
select jsonb_path_query('["2015–8–1", "2015–08–12"]', '$[*] ? (@.datetime() < "2015–08–2".datetime())');
select jsonb_path_query_array('[1, 2, 1, 3]', '$[*] ? (@ != 1)');
select jsonb_path_query('[1, 3, 7]', '$[*] ? (@ > 1 && @ < 5)');
select jsonb_path_query('[{"name": "John", "parent": false}, {"name": "Chris", "parent": true}]', '$[*] ? (@.parent == true)');
select jsonb_path_query('["Sachin Satya", "Underwood", "Dexter Johnson"]', '$[*] ? (@ starts with "Sachin")');
select jsonb_path_query('{"x": [1, 2], "y": [2, 4]}', 'strict $.* ? (exists (@ ? (@[*] > 2)))');
select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 129;
select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 921;

select jsonb_path_query_array(product, '$.cone[*]') from orders where product ? 'cone';
select jsonb_path_query_array('{"x": [2,3,4]}', '- $.x');
select jsonb_path_query_array('[1, "2", {}]', '$[*].type()');
select jsonb_path_query_array(data, '$.**.skus.** ? (@ != "Australia")') from cdata where id=5;
select jsonb_path_query_array('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2, "max":4}');
select jsonb_path_query_array('{"x": "20", "y": 32}', '$.keyvalue()');
select jsonb_path_query_array('["abc", "abd", "aBdC", "abdacb", "babc"]', '$[*] ? (@ like_regex "^ab.*c")');
select jsonb_path_query_array('["abc", "abd", "aBdC", "abdacb", "babc"]', '$[*] ? (@ like_regex "^ab.*c" flag "i")'); - case-insensitive match

select jsonb_path_query_first('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2, "max":4}');
select jsonb_path_query_first(product, '$.cone[*]') from orders where product ? 'icecream';

select json_array_elements('[1,true, [2,false]]');
select json_array_elements_text('["foo", "bar"]');
select jsonb_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]');
select json_each('{"a":"foo", "b":"bar"}') ;
select json_each(description) from product;
select json_each_text('{"a":"foo", "b":"bar"}');
SELECT distinct(jsonb_array_elements_text(data -> 'hobbies')) AS hobbies FROM emp;
select jsonb_array_elements(details -> 'tags'),jsonb_array_elements_text(details -> 'tags') from customers;
SELECT jsonb_agg(j) FILTER (WHERE j->>'name' IS NOT NULL) FROM table_name t, jsonb_array_elements(jsonb_col) j GROUP BY t.jsonb_col;
SELECT emp_id, data FROM emp WHERE data -> 'hobbies' @> '["Movies"]'::jsonb;
select * from users where data -> 'characteristics' @> '["semi-loyal"]';
select * from jsonb_to_recordset('[{ … }]') as x(id int, name text);
SELECT * FROM jsonb_to_recordset('[{"name": "batman"}, {"name": "superman"}]'::jsonb) AS x(name TEXT);
select key, value from planets, json_each_text(planets.details) where planets.planet='Neptune';
select jsonb_set(data, '{info, age}', (select data -> 'age' from users u1 where u1.id=u2.id)) from users u2;
select jsonb_set('[{"f1":1,"f2":null},2]', '{0,f3}', '[2,3,4]');
select jsonb_set_lax('[{"f1":1,"f2":null},2,null,3]', '{0,f1}', null);
select jsonb_set('{"a":{"b":2}'::jsonb, '{a,b}', '[1,2,3]');

create type subrowtype as (d int, e text); create type myrowtype as (a int, b text[], c subrowtype);
select * from json_populate_record(null::myrowtype, '{"a": 1, "b": ["2", "a b"], "c": {"d": 4, "e": "a b c"}, "x": "foo"}');
SELECT q.* FROM test_jsonb, jsonb_populate_record(NULL::test, data) AS q;
create type twoints as (a int, b int);
select * from json_populate_recordset(null::twoints, '[{"a":1,"b":2}, {"a":3,"b":4}]');
select * from jsonb_to_record('{"a":1,"b":[1,2,3],"c":[1,2,3],"e":"bar","r": {"a": 123, "b": "a b c"}}') as x(a int, b text, c int[], d text, r myrowtype);
select * from json_to_recordset('[{"a":1,"b":"foo"}, {"a":"2","c":"bar"}]') as x(a int, b text);
select jsonb_insert('{"a": [3,9,12]}', '{a, 1}', '"new_value"', true);
select jsonb_insert(data, '{hobbies, 1}', '"wfh"') from cdata where id=1;
select json_strip_nulls('[{"f1":1, "f2":null}, 2, null, 3]');

select sum(pg_column_size(id) + pg_column_size(val)) from test1_jsonb;
explain (analyze, buffers) select * from test1_jsonb where val ->> 'val3' = '2021–08–10';
explain analyze select * from assets where spec @> '{"ramsize":"4"}'::jsonb;

WITH items AS (
SELECT jsonb_array_elements(data -> 'items')
AS item FROM test
)
SELECT * FROM items
WHERE item ->> 'value' = 'aaa';

WITH items AS (
SELECT jsonb_each(data -> 'items')
AS item FROM test
)
SELECT (item).key FROM items
WHERE (item).value ->> 'status' = 'true';

UPDATE table_name SET data['key'] = '1';
update assets set spec = spec || '{"price":6780}'::jsonb where serial_number='RX100';
update users set data = jsonb_set(data, '{"age"}', '18') where id=111;
update class_data set jdata = jsonb_set(jdata, '{class}', '9', false) where id=22;
update class_data set jdata = jsonb_set(jdata, '{city}', '"Hyd"', true) where id=22;
update staff set staff_info = jsonb_insert(staff_info, '{address, city}', city::jsonb);
update cdata set data = jsonb_insert(data, '{hobbies, 1}', '"wfh"', true) where id=1;

select data - 'City' from cdata where id=3; -- deleting one key and it's value
select data #- '{City}' from cdata where id=3;

update cdata set data = (data #- '{City}') where id=3;
update catalog set properties = jsonb_set(properties, '{attributes}', (properties->'attributes') - 'is_new');
update cdata set data = jsonb_set(data, '{hobbies}', (data->'hobbies') - 'Tennis') where id=2;
update cdata set data = jsonb_set(data, '{enabled, offered}', (data->'enabled'->'offered')- 'addOns') where id=6;
update catalog set properties = properties - 'attributes' || jsonb_build_object('attributes','[is_gluten_free,is_lactose_free,is_new]') where properties ? 'attributes' returning *;
UPDATE users SET notifications = jsonb_path_query_array(notifications, '$ ? (@.notification_id != "<notification_id_for_removal>")') WHERE user_id = "<user_uuid4>";

Indexing

json and jsonb columns support indexing, which can be a significant advantage during query execution.

Popular choices are GIN, B-tree and Hash.

Not all index types support all operator classes.

B-tree indexes are the most common index type in relational databases.

PostgreSQL B-Tree indexes are multi-level tree structures, where each level of the tree can be used as a doubly linked list of pages. Each leaf page contains tuples that point to table rows.

If you index an entire JSONB column with a B-tree index, the ONLY useful operators are “=”, <, <=, >, >= (GIN indexes don’t support these operators).

If you only have queries with the “=” operator, then Hash indexes become useful. Hash indexes also tend to be smaller in size, than B-tree or GIN indexes.

-- Creating B-Tree and Hash indexes on JSON column data types
create index val3_idx on test1_jsonb((val ->> 'val3')); -- btree index
create index test_idx on test1_jsonb( lowercase(val) );
create index idxcustcode on customers((details -> 'address' ->> 'postcode'));
create index idx_users_age on users using btree( ((data->>'age')::int ));
create index ind_prd_hash on orders using hash((product->>'name'));

GIN — Generalized Inverted Index — useful for data types that have multiple values in a column, like Arrays, Range types, JSONB, Hstore — key/value pairs.

Works on @> and <@ and &&.

The primary goal of GIN indices is support for highly scalable, full-text search in PostgreSQL. GIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents.

An inverted index will store a set of (key, posting list) pairs, where the posting list is a set of heap rows in which the key occurs.

GIN index supports @@ and @? operators, which perform jsonpath matching.

GIN “operator classes”:

  • (default) jsonb_ops supports queries with top-level key-exists. Creates multiple entries for each key, so requires more storage, comparatively less performant. Can use ?, ?|, ?&, @>, @@, @?
  • jsonb_path_ops supports indexing the @> operator only. Supports ONLY queries with the @>, @@ and @? operators. Creates a single entry for the whole path, so requires less storage, comparatively more performant. No index entries for JSON structures with empty values.

GIN indexes support the “gin_trgm_ops” class that can be used to index the data in JSONB. JSONB has great built-in support for indexing arrays.

-- SQL queries to create GIN indexes on JSON data types
create index jsonb_idx_test1 on test1_jsonb using GIN (val jsonb_ops);
create index jsonb_idx_path_test1 on test1_jsonb using GIN (val jsonb_path_ops);
create index jsonb_idx_val4_test1 on test1_jsonb using GIN (( docdb -> 'tags' ) );
create index jsonb_idx_val1_test1 on test1_jsonb using GIN (( val ->> 'val1' ) gin_trgm_ops) ;
CREATE INDEX words_idx ON words USING GIN (word gin_trgm_ops);

select * from test1_jsonb where val -> 'val4' ? 'test1' ;
select * from test1_jsonb where (val ->> 'val1')::text ~ 'test86391' ;

explain (analyze, buffers) select * from test1_jsonb where (val ->> 'val1')::text ~ 'test345' ;

jsonpath

jsonpath type implements support for the SQL/JSON path language in PostgreSQL to efficiently query JSON data.

  • . Dot is used for member access.
  • [] Square brackets are used for array access.
  • $ A variable representing the JSON value being queried (the context item).
  • $varname A named variable. Its value can be set by the parameter vars of several JSON processing functions.
  • @ A variable representing the result of path evaluation in filter expressions.

JsQuery extension

JsQuery is PostgreSQL extension that requires PostgreSQL 9.4 or higher.

psql DB -c "CREATE EXTENSION jsquery;"
CREATE TABLE js ( id serial, data jsonb,
CHECK (data @@ '
name IS STRING AND
similar_ids.#: IS NUMERIC AND
points.#:(x IS NUMERIC AND y IS NUMERIC)'::jsquery));
SELECT gin_debug_query_path_value('x = 1 AND (*.y = 1 OR y = 2)');
SELECT gin_debug_query_value_path('x = 1 AND (*.y = 1 OR y = 2)');

Summary

PostgreSQL has great support for JSON and JSONB datatypes, since 2012.

Prefer PostgreSQL JSON and JSONB datatypes if your application needs full-text search.

You may prefer storing JSON documents in RDBMS like Postgres, if your dataset has nested objects in some applications, it is easier to handle them in JSON instead of denormalizing the data into columns or multiple tables.

Consider limiting JSON documents to a manageable size to decrease row lock contention among DML transactions.

Whenever possible, try to create many separate JSONB columns rather than a single larger column.

Index the column(s) only what you need to index, not necessarily the whole JSONB column.

--

--

Satya Thirumani

Cloud DBA.. and Data Science/Machine Learning Enthusiast..