Better API Design with postgres JSON

Using postgres jsonb data type for storing events

Published in

atulverma

5 min readFeb 9, 2020

It is no small feat that RDBMS have stood the test of time and have gotten better every year. It was around 2011 when i first used a nosql database on a real production system. The database (mongodb) was development friendly — easy to setup and easy to use. It supported JSON as first class objects.

It fit in very well with our REST microservices architecture. The apis were consuming json requests and returning json objects or collections. JSON and javascript was becoming the goto stack.

Over time as we used mongo it always felt something was missing. Was it better query support, indexes constraints or maybe it was just familiarity with writing sql queries in console and debugging easily.

So in another project when there was a requirement to store json objects with somewhat similar structure, and also to build basic event timeline and dashboards, i really wanted to try postgres with its json capabilities.

Before we move on want to mention that this article is not about event sourcing!

JSON and JSONB datatypes

Postgres gives two json datatypes: json and jsonb. The json type stores data as exact json input text and preserves order of object keys. In contrast jsonb is stored “in a decomposed binary format” and does not preserve whitespace or order of keys. jsonb is the recommended way to store data as it gives better performance. From postgres documentation

“In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialised needs, such as legacy assumptions about ordering of object keys”

Use case: Storing patient timeline events

The application we were building was centered around treating mental health patients using CBT. Patients could enroll and do online guided courses. It involved storing lots of data to track patient activity and their progress. Being able to present a timeline view of patient journey was an important product and usability requirement.

Schema Design

knex.schema.createTable('patient_event', table => {
  table.increments('id').primary();
  table.integer('authorId');
  table.integer('patientId');
  table.string('type').notNullable();
  table.jsonb('context').notNullable();
  table.jsonb('content').notNullable();
  table.boolean('showInTimeline').notNullable().defaultTo(false);
});// knex is the node.js library for db migrations and queries

Patient Events

An event patient can be generated when an action is done on or by a patient in the system. The event can be either generated automatically in background as a side effect of an action e.g patient status change.Or it can be generated manually in case of adding a note for a patient or sending a messages etc. Every event must be associated with a single patient

Event type
Each event has an event type identifier stored in table. This allows to group or filter events easily

Context (jsonb)

Event context json blob contains metadata about the event. Any information needed now or later to interpret the event. Context would have attributes needed to easily render the event on frontend timeline. We wanted events to be independent self contained entities and not dependent on any other service calls.
e.g. if patient completed a diagnostic questionnaire, there would be an event with type “QUESTIONNAIRE COMPLETED”. Context would have questionnaire name, questionnaire id, patient symptoms at the time etc.

Important information like who is the treating therapist at the time event occurs could be stored in context. All this information is good for reporting later on.

Events timeline would be shown at multiple places on frontend. At this time calling multiple services to make sense of the event would increase latency and is probably a bad frontend experience.It made more sense to keep more than just IDs in context. In microservices architecture services sometimes become very chatty and we wanted to avoid that. We also dint want to write database migrations to associate more contextual data later so the idea was to keep as much info as possible with the events from the start.

Content (jsonb)

Content is the actual event data. It contains data specific to the event type. e.g. For events related to taking patient notes content would contain:

content: {
 headline: { text: ‘Weekly checkup’ }, // can be html or text
 body: ‘notes-rich-text’,
 activity-tags: [‘progress’, ‘weekly-checkup’, …], 
}

For a simple event like patient finishes an online lesson which triggers an event in system:

content: {
  headline: [{
      text: ‘patient-first-name patient-last-name finished lesson-      name in course xyz’
  }]
}

API Design

We had initially thought that for creating patient notes we would have multiple api endpoints one for each type of note. We had also imagined that we would create a timeline service or endpoints which would have to mashup data from multiple sources in a linear and chronological sequence to be able to render it on frontend.

But what we ended up with was:

One simple `PUT /notes` api to allow therapists to enter patient notes for different cases — catchup, symptoms, risk, change of contact time, etc
and another `GET /timeline` api to view timeline based on filters.

Request json was a structured json which we could just dump in our json content blob in the table. But we could now decorate the content with more metadata and context that we wanted.

We had multiple event handlers which could be invoked either with incoming decorated request or could be invoked in background as a side effect of any status change or patient activity. It became simpler to develop and maintain multiple events handlers. The database schema could accommodate all the scenarios we had at least for now. With one table as patient event source it became easy to query and filter data.

It also helped us a great deal with simple reporting we needed. We had pretty decent visual dashboards to track how the patients were doing in a course over a time period.

e.g. How many patients completed a lesson in last week or month. How many patients had elevated risk in last week or month. How many patients a therapist has treated

# folder structure/event-handlers
 |
 | — -on-lesson-completed
 | — -on-status-changed
 | — -on-risk-changed
 | and more . . .

Indexes

Postgres allows indexing jsonb column using GIN or usual BTREE indexes. Its recommended to have BTREE indexes on specific json attributes if possible for better performance both for index creation/updates and query lookups. GIN indexes are really powerful but the use case should fit the need, as there is an overhead of creation and updation and the size of index.

Up until now we dint really need to use indexes in the application and we would wait till the data grows larger and there is a real need to create indexes. Event timeline lookup is filtered by `patientId` or `createdAt` timestamp in most queries.

So thats a brief overview of how we leveraged postgres jsonb data type to our advantage all the while retaining the option to be able to write sql queries. The experience was really good and I would probably be confident to use postgres than an all out nosql in future too.