There is no right way to store FHIR®

Nick Hatt
fhirbase dojo
Published in
3 min readOct 16, 2018

--

If you’re wondering how to design a persistence layer behind HL7 FHIR®, you may get wildly different opinionated answers. This post outlines some things I consider first principles, things that will always be true that can help put your mind at ease so you can developer the next killer digital health product.

Principle #1 Don’t Panic

First, don’t panic. Some of the world’s largest FHIR® deployments are running MUMPS. Literally any choice you make about data storage can be made to both work and scale. If that was all you were worried about, go build something cool.

FHIR® - like any other standard, is a one-size-fits-most solution. Data storage is always a one-size-fits-you solution. The problem is you don’t know anything about your product when first starting. Projects like HAPI FHIR use an ORM, so you don’t ever have to worry about the database if you don’t want to. Real problems arise with scale. Most modern databases are blazingly fast, and can tolerate some poor design up to many millions of rows/documents/whatevers.

Principle #2 Make the data close to the database primitives

FHIR® has design features that do not scale well under certain conditions. A perfect example of this is Extensions. Extensions mean that FHIR® may be extended nearly infinitely beyond the base spec. From a data model perspective — they follow an Entity-Attribute-Value (EAV) patten. This is a considered an anti-pattern for relational databases, and it’s easy to see why with a quick FHIR® example.

Patient does not include fields like race, ethnicity, or nationality for reasons explained in the Patient resource documentation. How would you go about designing a table to hold extensions? Do you have one table for all extensions across all resources? Do you make an extensions table for each resource? Do you store extensions in the Patient table, as something like a jsonb datatype is Postgres? Do you just pluck out the extensions you care about and turn them into columns? Aidbox/Health Samurai have a great idea that makes sense for both Postgres and general that they call first-class extensions. Essentially they add a “race” property to Patient.

As you can imagine, each of those approaches have tradeoffs. For example, here is a a great writeup of jsonb and the associated tradeoffs. Naturally, these tradeoffs will change as the number of rows in your database grows.

Principle #3 Database logic is application logic

Inherent in every SQL statement, every MongoDB query, and every Redis command is some level of logic. Every application development team must ask the question:

How much of our business logic should be maintained in the database?

When storing FHIR® however, you need to consider what logic external sources of data have baked into their resources. Many resources, such as MedicationRequest, have status properties to convey a state. A system receiving MedicationRequests may need to trigger certain actions based on the status of the resource. Regardless of where you define these triggers, you should sanitize data coming in.

FHIR® is an exchange standard. This means that third parties will be sending you data. The old mantra garbage in, garbage out applies. It essential that you protect your database from bad values, or potentially broken logic. In a relational database, there are abundant tools for this — datatypes, foreign keys, constraints.

Conclusion

You should avoid premature optimization when it comes to backing any interoperability solution with a data store.

As you dig deeper into FHIR® and integrate with more parties you’ll find more edge cases that are difficult to store. Do you have a reference that points a contained resource? How about a reference that points to resource you don’t even know exists because of version skew? Reasoning about each of these new cases and applying these principles can help you end up with a healthy database over time.

--

--