Kaji: a general purpose FHIR clinical data repository

Ben Spencer
Mar 28, 2019 · 7 min read

火事 kaji: (n) fire; conflagration

Kaji is a general purpose clinical data repository (CDR) implementing a large portion of the FHIR STU3 spec. It differs from other implementations in the space in various ways:

  • general purpose: no specific use case envisaged
  • pragmatism: focus on developing features that are genuinely useful rather than blindly attempting to cover the entire standard
  • performance over deployment flexibility: use any database (as long as it’s postgres)
  • data integrity above all else: clinical data must not be lost (even if, as is often the case in real life, it is inconsistent)
  • multi-tenanted: we want to be able to support multiple datasets on the same hardware, particularly for test deployments

We have deployed a sandbox instance at https://kaji.healthforge.io to allow the public to experiment with the server and run the Crucible test suite against it. We’ve also put the Docker container on Docker Hub (MIT license) for anyone who wants to deploy their own instance.

Technology stack

A requirement for the server was that it should run on the JVM, and this has influenced certain other decisions about the stack.


HAPI FHIR is the Java reference implementation for the standard, so it made sense to start here, but it is not without its issues, described later. We use its FHIR data model, marshalling and validation capabilities, but not any of its REST frontend or JPA backend support.

Scala / Finagle

Finagle is a high-performance networking framework built by Twitter, on top of the popular netty asynchronous I/O library. It is written in Scala, but provides APIs to be used from either Java or Scala. We prefer Scala as a language, mainly due to reduced verbosity and great support for functional programming constructs.


PostgreSQL, as (accurately) described by its developers, is “the world’s most advanced open source database”. At its core is the relational model, but the developers don’t shy away from providing object or document database features where they are useful. PostgreSQL is particularly suitable as the backend to a FHIR server. This can be attributed to many of its advanced features, but jsonb is particularly prominent. I am not personally a fan of the JSON encoding of FHIR resources — it involves some ugly kludges that belie the underlying XML-centric data model, — but the power of jsonb is sufficient to outweigh these concerns and so we use it as our primary storage format. The GIN index support is extremely useful and allows us to optimise a wide range of FHIR queries out of the box.


We deploy Kaji as a docker container based on the standard openjdk base. Current production deployments are based in Amazon ECS using an RDS backend and Google Kubernetes Engine using a Cloud SQL backend.

We have found Kubernetes preferable to ECS for a variety of reasons, but Amazon’s EKS service was not available when we first deployed.



HAPI (as a data model library) is not necessarily an ideal building block for a general purpose FHIR server. As a reference implementation, it is required to cater for all possible requirements, and as a result comes with a fair amount of baggage that we don’t need. A general purpose server isn’t that interested in the specifics of individual resources (except perhaps in some specialised cases such as audit records), so it is not particularly useful to have separate Java classes for each and every one (not to mention the fact that mutable classes with getters and setters don’t really fit our scala development style). What is really needed is a library that implements the underlying FHIR data model, supplemented with functionality for marshalling, validating etc, at a lower level.


The scala compiler is notoriously slow (it performs around 25 passes over the code), particularly when making heavy use of features such as implicits and macros. The codebase can be adapted to improve matters somewhat, but that in itself requires a fair amount of work for diminishing returns. Scala IDEs tend to be sluggish for the same reason. Personally I cope with this by working on machines with the latest core i7s and 64G of RAM, but my colleagues insist on using macbooks and suffer as a result.


Finagle is an industrial-grade RPC framework that provides sometimes deceptively simple abstractions over the complexity and power of both its internals and those of the underlying netty library. This makes it easily approachable for new users, but abstractions tend to leak and eventually it becomes necessary to dig deeper. For example, consider the default client stack:

Figure 1. Finagle default client stack

Each of these components (filters) provides useful functionality that users tend to take for granted, until something goes wrong. Similarly, the default LocalScheduler just works out of the box, but a deeper understanding of its internals is necessary to achieve optimal performance.

When things go wrong with finagle (and we have seen some strange interactions with other low-level networking software such as Amazon’s ELBs), it can be difficult to debug. Error messages are often unhelpful and you need both access to and a deep understanding of the vast wealth of metricsthat the framework provides.


A common complaint about postgres is that it is not as easy to horizontally scale as more recently developed distributed databases. It would be possible to implement manual sharding of the database in Kaji. Given the availability of vertical scaling (for example, AWS offers huge RDS servers such as the db.x1e.32xlarge), I don’t think this is a particularly high priority.

Case study: Vision Coach

Vision Coach is a novel digital platform for managing patients with diabetic macular edema. You can read a detailed description of it here. In short, the platform consists of a mobile app for patients with diabetic macular edema (DME, a form of diabetic eye disease), built using React Native for iOS and Android, and a web app for ophthalmologists managing DME patients, built using React. The main components are summarised in Figure 2.

Figure 2. Vision Coach platform.

We took a FHIR-first approach to the design of the Vision Coach data model. Since interoperability in some form is an almost inevitable requirement for any healthcare app — for example, integration with other apps and tools, or even compliance with legal requirements such as the GDPR right to data portability — we regard this approach as justified. In the case of Vision Coach, an obvious early manifestation of this requirement was the need to avoid duplicating data entry, something busy clinicians simply don’t have time for.

Being forced to align your model with an interoperable standard from the start can save a lot of effort down the line when these requirements hit. It’s not a panacea: FHIR is merely a platform standard, and there may well be some mapping work to do to comply with a target profile, but it’s a good starting point. The alternative is a custom database designed purely to meet the (initial) requirements of the app. This is certainly a lot faster to get going with, but it’s short-sighted, not only for interoperability reasons, but because requirements can and do change. Sure, you can migrate the data model as new requirements arise, but using a general-purpose CDR like Kaji from the start obviates this need in important ways. It’s also worth noting that (especially for newcomers to the healthcare domain), FHIR provides a baseline data model that is likely to be more in line with existing practice than something drawn up from a blank slate.

Figure 3. shows how the Vision Coach data model maps onto a selection of FHIR resources.

Figure 3. Vision Coach data model mapping to FHIR

As you can see, there is not a 1:1 mapping between entities from the UI point of view and resources in the FHIR database. I think this is likely to be a common problem in FHIR-capable apps and is another good reason to design the mapping up-front.

Future work

There are three obvious areas for future work on the project:


Kaji currently implements FHIR STU3. It would definitely make sense to add support for R4, which is the current version of the standard and includes normative content.

Missing features

FHIR is a large standard, not typically intended to be implemented in its entirety outside reference implementations. We are missing a few broad areas which would be useful, such as transactions.

Beyond the base standard

There are various extensions to, profiles on, and draft additions to the base standard that would be useful to implement. For example: SMART on FHIR, various IHE standards, Structured Data Capture and GraphQL. It may also be useful to provide access to the same underlying clinical data using non-FHIR standards such as DICOMweb, CTS2, LDAP / IHE HPD and OpenEHR.

Watch this space for more updates and releases over the course of 2019.


Healthforge makes tools for teams building healthcare…

Ben Spencer

Written by


Healthforge makes tools for teams building healthcare software and makes bespoke software for healthcare enterprises.

Ben Spencer

Written by


Healthforge makes tools for teams building healthcare software and makes bespoke software for healthcare enterprises.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store