Using Fybrik to create a privacy-aware framework to access FHIR data

Published in

fybrik

6 min readApr 14, 2022

As the COVID-19 pandemic has shown, the ability to exchange healthcare data, both for research and for patient care, is critical. In fact, the push for interoperability of healthcare data started well before the pandemic; in the US, the Office of the National Coordinator for Health Information Technology (ONC) has been advocating the use of Fast Healthcare Interoperability Resources (FHIR) since the mid 2010’s. Subsequently, the ONC’s “Cures Act Final Rule” mandates that certain developers of health IT must provide a certified FHIR API to their customer base by December 31, 2022.

While the sharing of healthcare data has tremendous potential for improved patient care and research, the last several years have given rise to an increased emphasis on the protection of personal digital data. For example, the European Union’s General Data Protection Regulation (GDPR) went into effect in 2018, and strictly defines rules on the access, storage and transfer of personal data.

FHIR, however, was not initially developed with patient privacy in mind. It was only with Release 4 that the FHIR standard began to mandate security and privacy requirements.

The FHIR standard creates a comprehensive model of the healthcare environment through a collection of resources, where a FHIR resource is the smallest data package that can exchanged electronically. Currently, there are around 145 different resources defined (such as Patient, Observation), and each resource is described by a collection of attributes, such as id, first name and last name. A more recent addition to the FHIR standard defines five groups of tags to be associated with resources, one of which expresses the level of security associated with the data in that resource.

Originally designed as an open, free and standards-based API to allow healthcare application to be launched from Electronic Health Records (EHR) systems, SMART on FHIR has been gaining in popularity as a means to implement security and privacy requirements required by FHIR. SMART too though is only capable of enforcing privacy requirements at the FHIR resource level, although they do have the concept of enforcement for a given access type (i.e. read or write). For example, in SMART, specifying “Observation.read” would allow read access to the entire contents of Observation resources. This “all or nothing” access to FHIR resources will not allow for a more fine-grained, attribute level access to data, such as the redaction of sensitive “subject” attribute in the Observation resource, which would reveal patient names. Consequently, this restricts the ability to provide on-the-fly anonymization of data from restricted resources and will not support more general policy-driven data rules, for example, restricting EU health data from being exported outside of the EU. Additionally, the implementation of SMART requires support from the EHR vendor, which can rule out older legacy systems in hospitals.

In the EU sponsored H2020 project, HEIR, we have taken a different approach to protect the privacy of FHIR data. We created a Privacy-aware Framework built on top of the Open-Source project, Fybrik. Extending Fybrik with a custom FybrikModule designed around HEIR requirements, we created a prototype application that allows third-party researchers to execute ad-hoc queries on a hospital’s FHIR server, subject to policy-defined data constraints.

Leveraging Fybrik’s Policy Manager connector, we use Open Policy Agent as our data governance engine and the Rego language to define HEIR’s fine-grained, powerful policy rules for the protection of FHIR data. These rules can take into account virtually any type of access requirements, such as the geographical locations of the data requester and store, and the intended use of the data.

The idea behind Fybrik

The idea behind Fybrik is to create a secured, controlled, data path between the data producer/consumer and the data sink/source, abstracting away from the user details such as key management for connectivity to the data, format translations between the data source and destination and governance rules. Using Fybrik, all accesses to a data source must go through an automatically configured Fybrik data path endpoint, which is secured using Kubernetes and Istio. Based on policies defined by the hospital Data Governance Officer, Fybrik will build the data path that will automatically perform the required redaction operations on the data.

Typically, there would be a number of personas involved in the configuration and deployment of the Fybrik environment, including:

The hospital’s Data Governance Officer, who is responsible for adherence to policies and regulations mandating the use and retention of healthcare data. The Data Governance Officer will define the data policies in a policy manager describing how data can be used. This officer will also register the hospital’s data sources in the hospital’s data catalog, defining sensitive fields in the data’s schema.
The hospital’s IT Administrator who is responsible for the Kubernetes cluster in the hospital running Fybrik, as well as the deployment of Fybrik Modules and Fybrik Applications on behalf of external requesters.
The data requester, typically a data scientist who belongs to an organization outside of the hospital and wants to obtain portions of the hospital’s FHIR data for analysis through standard FHIR queries.

The Fybrik Control Plane automatically creates the required data path based on declarative input files, which include:

A description of the data access policy, typically defined by the hospital’s Data Governance Officer,
A request for access from a data requester, which indicates, among other parameters, the required data source, the name of the requesting organization, and the intent of use for the data.
A classification of the data source, typically assigning tags such as “PII” (Personal Identifiable Information) to attribute values within that data source. For example, using the hospital’s FHIR server as a data source, we can tag attributes as PII, either globally (e.g., “id in all resources”) or at a resource level (“id in Observations”). These tags can subsequently be used in the data access policy rules, e.g., “redact all PII values if requester intent is ‘research’”. This would typically be performed by the hospital’s Data Governance Officer and would be persisted in the hospital’s data catalog.

The Fybrik Control Plane leverages a collection of Modules deployed by the IT admin and interfaces with a Policy Manager. A Fybrik read Module serves as the intermediary in the path between the data requester and the data source. The Policy Manager uses the predefined data access policy, the data source description, and other parameters such as the intent of use specified by the data requester to evaluate a policy decision, which is then passed to the Fybrik Module via the Fybrik control plane for enforcement. Based on the policy decision, the Fybrik Module can block a request to a resource, redact attributes in the returned resource records, or return just a statistical summary of attributes from returned data fields.

An illustration of this concept can be seen in Figure 1.

How this can be used

The use case envisions a hospital environment hosting a FHIR server. Third parties requesting access to the hospital’s FHIR records will need to work with the hospital’s Data Governance Officer to determine which data will be provided, and the context under which data will be made available. The hospital’s Data Governance Officer will then compose the access policy file for the third party, feed it in to the OPA governance engine, and approve invocation of the Fybrik Application to create the data path for the third party. If in the future access policies change, the policy file will just need to be resubmitted to the OPA governance engine– since there is a separation of policy logic from application logic, no recoding of the Fybrik components will be required to enable the Fybrik data path to handle the new policies.

An entry point to the Fybrik data path will be exposed outside of the hospital with a public IP address. The third-party researcher wishing to perform ad-hoc queries must attach a JSON Web Token (JWT) to the header of each FHIR query request which authenticates the researcher. The researcher then submits this query to the exposed URL, just as would be done using a standard FHIR server. If the identity of the researcher in the JWT matches the identity of the authorized researcher set in the Fybrik Application at invocation time, and the policy allows the researcher access to the requested FHIR resource, appropriately redacted query results will then be returned.

In order to provide auditability, we log all data requests — both successful and blocked — to a Kafka topic where they can be logged by another component.

Kick the tires

The code and installation guide describe here can be found at:

https://github.com/fybrik/REST-read-example

The research leading to these results has received funding from the European Community’s Horizon 2020 Research and Innovation Programme under grant agreement n° 883275.

Using Fybrik to create a privacy-aware framework to access FHIR data

Kick the tires

Written by Eliot Salant