FHIR Data De-identification using Cloud Healthcare API(Part -2)

Sudharma Mokashi
Google Cloud - Community
4 min readJan 3, 2023

Welcome Back! In this blog we will continue to learn more about FHIR data transformation, especially how data de-identification can be done on GCP using Cloud Healthcare API. We would be using the FHIR data store which we created in the previous blog. If you haven’t followed the previous blog, click here.

As I mentioned this blog is going to be very specific around FHIR data de-identification and insertion of de-identified FHIR data into BigQuery.

You might be thinking about what is data de-identification and why it is needed ?

What is data de-identification ?

Data de-identification is the process of removing or obfuscating Personally Identifiable Data (PII data) from any documents or other information that also contains that person’s Protected Health Information (PHI). To know more about the data de-identification standard provided by HIPAA click here.

Data de-identification

What is the use of data de-identification in Healthcare ?

De-identified data can be used for analysis, training, and evaluating machine learning models and for sharing with non-privileged parties, while protecting patient privacy.

As we know what is data de-identification and why it is important, let’s do quick demo to understand how the data de-identification of FHIR data can be performed using Google Cloud Platform’s Cloud Healthcare API.

Steps to de-identify the FHIR data :

Step-1 : If you have not created FHIR data store, please follow the steps provided in part-1 of this series.

Step-2 : Let’s create one more Healthcare dataset and FHIR data store to save the de-identified data.

Creation of de-identified dataset

The created dataset will be shown on the Healthcare browser screen

Healthcare Browser

Now, lets create a FHIR data store within de-identified-dataset

Data Store Type and ID

Configure the data store

Data Store Configuration

Keep stream resource changes to BigQuery as is, select/ create a pub/sub topic for notifications and Click on create.

Create Data Store

You would be able to see the newly created data store under de-identified-dataset.

Newly Created Dataset

Step-3 : Let’s create one more BigQuery dataset to store de-identified data.

Create BigQuery Dataset to store de-identified data

Step-4 : Let’s go back to the Healthcare browser, select the ‘healthcare-fhir-test’ dataset.

select healthcare-api-fhir-test dataset

Step-5 : Select De-identify from the actions of data store.

select De-identify from the Actions

Step-6 : Select appropriate dataset and data store values where the de-identified data should get stored.

De-identify FHIR store

Step-7 : On the pop-up screen, click on Append to continue.

Append to continue

Click on de-identify and it will start de-identifying the data.

Step-8 : Let’s check if the de-identification is complete in operations section.

Logs of data de-identification operation

Step-9 : Let’s check the de-identified data by opening the FHIR viewer from the actions of de-identified data store.

Open FHIR viewer of de-identified data store

Step-10 : Search for patient resource as we only have patient resource in our main data store.

select Patient resource

here is the de-identified data, where we can clearly see that Identifiers,Given Name, Family Name is de-identified.

De-identified patient resource

Note : To check how Cloud Healthcare API de-identifies the data click here.

Step-11 : Let’s export the de-identified data into BigQuery using the same export option which we have used in the previous part.

Export de-identified data to BigQuery

Step-12 : Let’s explore the de-identified dataset and patient table in BigQuery.

BigQuery de-identified dataset

Done! We have successfully de-identified FHIR patient resource and exported it to BigQuery dataset.

In the upcoming blogs, we will learn more about HL7V2 messages and DICOM datasets. So stay tuned! Thank you!

References

--

--