Modeling Status Updates on PostgreSQL using JSON

Many applications require records to keep a status (‘complete’, ‘cancelled’, ‘in progress’, etc). Usually, there is also an associated traceability constraint: we need to know when the status was modified and who modified it.

Traditionally, this would be accomplished using 2 tables: a “base table” for the object’s data and a “status table” for status updates. However, taking advantage of PostgreSQL’s support of JSON, there is a better alternative.

The Traditional Route

Let’s try to model an example using the “2 tables” approach.

We can imagine a ticket system with tables like these:

Case 1: Getting a ticket’s full history

SELECT *
FROM status_updates
WHERE ticket_id = 99;

With a very simple query, we are able to get the complete ticket’s history.

Case 2: Getting a ticket’s current status

  SELECT status 
FROM status_updates
WHERE ticket_id = 99
ORDER BY created_at DESC LIMIT 1;

Getting a ticket’s current status requires sorting through all status_updates with a matching ticket_id and choosing the most recent one.

This query is not too hard to figure out, and as long as we keep an index on ticket_id, it will perform well.

Case 3: Getting all tickets with status OPEN

WITH tickets_with_status AS (
SELECT DISTINCT ON(ticket_id)
ticket_id, status
FROM status_updates
ORDER BY ticket_id, created_at DESC
)
SELECT *
FROM tickets_with_status
WHERE status = 'OPEN';

This is where things get a bit more complicated. The problem is that we need to execute 2 separate steps:

  1. Figure out each ticket’s current status by looking at all status_updates and choosing the most recent one .
  2. Filter based on the status we’re looking for, ie: OPEN.

Not only is this query a bit harder to figure out, it will also perform badly, since there is no way to index it.

The JSON route

Taking advantage of PostgreSQL’s support for JSON, we can model the same ticket system using a single table.

Creating the tickets table

CREATE TABLE tickets (
id SERIAL PRIMARY KEY,
description TEXT,
reporter INTEGER NOT NULL REFERENCES users(id),
assignee INTEGER REFERENCES users(id),
status_updates JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL
);

Notice how the column status_updates has type JSONB (JSONB is a more efficient version of JSON).

Creating a new Ticket

There are several ways to create JSON objects in PostgreSQL. The easiest way is to construct a string and then cast it to JSON using ::json.

'[{"status": "OPEN", "author": 3}]'::json

However, since we need a dynamic value in our JSON, ie: NOW(), we’re going to use the slightly more verbose json_build_array and json_build_object.

INSERT INTO tickets (
description, reporter, status_updates, created_at
)
VALUES (
'Something went wrong.',
3,
jsonb_build_array(
jsonb_build_object(
'status', 'OPEN',
'author', 3,
'created_at', NOW()
)
),
NOW()
);

The status_updates value for our new ticket should look like this:

[{
"status": "OPEN",
"author": 3,
"created_at": "2017-02-08T00:10:15.288522-05:00"
}]

Adding a new Status Update

UPDATE tickets
SET status_updates = jsonb_build_array(
jsonb_build_object(
'status', 'CLOSED',
'author', 2,
'created_at', NOW()
)
) || status_updates
WHERE id = 1;

Notice the use of PostgreSQL’s concatenation operator || to insert a new value to the start of the existing array.

After this update, the status_updates value for our new ticket should look like this:

[
{
"status": "CLOSED",
"author": 2,
"created_at": "2017-02-08T00:40:09.926089-05:00"
},
{
"status": "OPEN",
"author": 3,
"created_at": "2017-02-08T00:39:56.437954-05:00"
}
]

Case 1: Getting a ticket’s full history

SELECT status_updates FROM tickets WHERE id = 99;

Since the complete history is in the status_updates column, fetching it is pretty straightforward.

Case 2: Getting a ticket’s current status

If you follow the rule of always inserting new status updates to the start of the status_updates array, you can use this simple query:

SELECT status_updates->0->>'status'
FROM tickets
WHERE id = 99;

Case 3: Getting all tickets with status OPEN

SELECT id
FROM tickets
WHERE status_updates->0->>'status' = 'OPEN';

This is where JSON shines. Not only is this query super easy to write, it is also easy to index (PostgreSQL support indexes on expressions).

CREATE INDEX current_status_idx
ON tickets ((status_updates->0->>'status'));

Bonus: Adding constraints

Constraints allow us to prevent data corruption and keep our data clean. JSON data types should not be exempted from using them.

Since we’re expecting 3 values inside each status update (status, author, and created_at), let’s add some constraints for these fields.

ALTER TABLE tickets
-- make sure status has a valid string
ADD CONSTRAINT status_is_valid CHECK(COALESCE(
status_updates->0->'status' ?| ARRAY[
'OPEN', 'CLOSED', 'WAITING_USER_FEEDBACK'
], false
)),
-- make sure author is a number
ADD CONSTRAINT author_is_number CHECK(COALESCE(
jsonb_typeof(status_updates->0->'author') = 'number',
false
)),
-- make sure created_at is a timestamptz
ADD CONSTRAINT created_at_is_timestamptz CHECK(COALESCE(
status_updates->0->>'created_at' ~
'^\d{4}-\d{2}-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{6}[+-]\d{2}:\d{2}$',
false
));

Also, you might have noticed there is a big IF in the above queries:

If you follow the rule of always inserting new status updates to the start of the status_updates array …

As long as you insert using SET status_updates = <new_status> || status_updates this condition will hold. However, let’s just add one more check in case we accidentally try inserting to the end of the array.

ALTER TABLE tickets
ADD CONSTRAINT is_unshift CHECK(
-- check the array has only one element, or
jsonb_array_length(status_updates) = 1 OR COALESCE(
  -- check the N-1 element has a more recent created_at than the N
-- element.
  -- Since dates are compared as the amount of time passed since
-- epoch, a more recent date is 'bigger' than an older date
  (status_updates -> jsonb_array_length(status_updates) - 2 ->>
'created_at')::timestamptz
>
(status_updates -> jsonb_array_length(status_updates) - 1 ->>
'created_at' )::timestamptz
  , false)
);

Conclusion

If you are willing to look past the 1NF theory, you can take advantage of the simplicity and performance boosts that PostgreSQL’s JSON types can offer.

Status updates are a very good fit for JSON types since they’re rarely useful outside the context of the original object.

Can you think of other scenarios where JSON types are useful? Have you solved the “status updates” problem in a different way? Let me know in the comments.