Modeling Status Updates on PostgreSQL using JSON
--
Many applications require records to keep a status (‘complete’, ‘cancelled’, ‘in progress’, etc). Usually, there is also an associated traceability constraint: we need to know when the status was modified and who modified it.
Traditionally, this would be accomplished using 2 tables: a “base table” for the object’s data and a “status table” for status updates. However, taking advantage of PostgreSQL’s support of JSON, there is a better alternative.
The Traditional Route
Let’s try to model an example using the “2 tables” approach.
We can imagine a ticket system with tables like these:
Case 1: Getting a ticket’s full history
SELECT *
FROM status_updates
WHERE ticket_id = 99;
With a very simple query, we are able to get the complete ticket’s history.
Case 2: Getting a ticket’s current status
SELECT status
FROM status_updates
WHERE ticket_id = 99
ORDER BY created_at DESC LIMIT 1;
Getting a ticket’s current status requires sorting through all status_updates
with a matching ticket_id
and choosing the most recent one.
This query is not too hard to figure out, and as long as we keep an index on ticket_id
, it will perform well.
Case 3: Getting all tickets with status OPEN
WITH tickets_with_status AS (
SELECT DISTINCT ON(ticket_id)
ticket_id, status
FROM status_updates
ORDER BY ticket_id, created_at DESC
)
SELECT *
FROM tickets_with_status
WHERE status = 'OPEN';
This is where things get a bit more complicated. The problem is that we need to execute 2 separate steps:
- Figure out each ticket’s current status by looking at all
status_updates
and choosing the most recent one . - Filter based on the
status
we’re looking for, ie:OPEN
.
Not only is this query a bit harder to figure out, it will also perform badly, since there is no way to index it.
The JSON route
Taking advantage of PostgreSQL’s support for JSON, we can model the same ticket system using a single table.
Creating the tickets table
CREATE TABLE tickets (
id SERIAL PRIMARY KEY,
description TEXT,
reporter INTEGER NOT NULL REFERENCES users(id),
assignee INTEGER REFERENCES users(id),
status_updates JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL
);
Notice how the column status_updates
has type JSONB
(JSONB is a more efficient version of JSON).
Creating a new Ticket
There are several ways to create JSON objects in PostgreSQL. The easiest way is to construct a string and then cast it to JSON using ::json
.
'[{"status": "OPEN", "author": 3}]'::json
However, since we need a dynamic value in our JSON, ie: NOW()
, we’re going to use the slightly more verbose json_build_array
and json_build_object
.
INSERT INTO tickets (
description, reporter, status_updates, created_at
)
VALUES (
'Something went wrong.',
3,
jsonb_build_array(
jsonb_build_object(
'status', 'OPEN',
'author', 3,
'created_at', NOW()
)
),
NOW()
);
The status_updates
value for our new ticket should look like this:
[{
"status": "OPEN",
"author": 3,
"created_at": "2017-02-08T00:10:15.288522-05:00"
}]
Adding a new Status Update
UPDATE tickets
SET status_updates = jsonb_build_array(
jsonb_build_object(
'status', 'CLOSED',
'author', 2,
'created_at', NOW()
)
) || status_updates
WHERE id = 1;
Notice the use of PostgreSQL’s concatenation operator ||
to insert a new value to the start of the existing array.
After this update, the status_updates
value for our new ticket should look like this:
[
{
"status": "CLOSED",
"author": 2,
"created_at": "2017-02-08T00:40:09.926089-05:00"
},
{
"status": "OPEN",
"author": 3,
"created_at": "2017-02-08T00:39:56.437954-05:00"
}
]
Case 1: Getting a ticket’s full history
SELECT status_updates FROM tickets WHERE id = 99;
Since the complete history is in the status_updates
column, fetching it is pretty straightforward.
Case 2: Getting a ticket’s current status
If you follow the rule of always inserting new status updates to the start of the status_updates
array, you can use this simple query:
SELECT status_updates->0->>'status'
FROM tickets
WHERE id = 99;
Case 3: Getting all tickets with status OPEN
SELECT id
FROM tickets
WHERE status_updates->0->>'status' = 'OPEN';
This is where JSON shines. Not only is this query super easy to write, it is also easy to index (PostgreSQL support indexes on expressions).
CREATE INDEX current_status_idx
ON tickets ((status_updates->0->>'status'));
Bonus: Adding constraints
Constraints allow us to prevent data corruption and keep our data clean. JSON data types should not be exempted from using them.
Since we’re expecting 3 values inside each status update (status
, author
, and created_at
), let’s add some constraints for these fields.
ALTER TABLE tickets-- make sure status has a valid string
ADD CONSTRAINT status_is_valid CHECK(COALESCE(
status_updates->0->'status' ?| ARRAY[
'OPEN', 'CLOSED', 'WAITING_USER_FEEDBACK'
], false
)),-- make sure author is a number
ADD CONSTRAINT author_is_number CHECK(COALESCE(
jsonb_typeof(status_updates->0->'author') = 'number',
false
)),-- make sure created_at is a timestamptz
ADD CONSTRAINT created_at_is_timestamptz CHECK(COALESCE(
status_updates->0->>'created_at' ~
'^\d{4}-\d{2}-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{6}[+-]\d{2}:\d{2}$',
false
));
Also, you might have noticed there is a big IF in the above queries:
If you follow the rule of always inserting new status updates to the start of the
status_updates
array …
As long as you insert using SET status_updates = <new_status> || status_updates
this condition will hold. However, let’s just add one more check in case we accidentally try inserting to the end of the array.
ALTER TABLE tickets
ADD CONSTRAINT is_unshift CHECK(
-- check the array has only one element, or
jsonb_array_length(status_updates) = 1 OR COALESCE( -- check the N-1 element has a more recent created_at than the N
-- element. -- Since dates are compared as the amount of time passed since
-- epoch, a more recent date is 'bigger' than an older date (status_updates -> jsonb_array_length(status_updates) - 2 ->>
'created_at')::timestamptz
>
(status_updates -> jsonb_array_length(status_updates) - 1 ->>
'created_at' )::timestamptz , false)
);
Conclusion
If you are willing to look past the 1NF theory, you can take advantage of the simplicity and performance boosts that PostgreSQL’s JSON types can offer.
Status updates are a very good fit for JSON types since they’re rarely useful outside the context of the original object.
Can you think of other scenarios where JSON types are useful? Have you solved the “status updates” problem in a different way? Let me know in the comments.