FHIR Data De-identification using Cloud Healthcare API(Part -2)
Welcome Back! In this blog we will continue to learn more about FHIR data transformation, especially how data de-identification can be done on GCP using Cloud Healthcare API. We would be using the FHIR data store which we created in the previous blog. If you haven’t followed the previous blog, click here.
As I mentioned this blog is going to be very specific around FHIR data de-identification and insertion of de-identified FHIR data into BigQuery.
You might be thinking about what is data de-identification and why it is needed ?
What is data de-identification ?
Data de-identification is the process of removing or obfuscating Personally Identifiable Data (PII data) from any documents or other information that also contains that person’s Protected Health Information (PHI). To know more about the data de-identification standard provided by HIPAA click here.
What is the use of data de-identification in Healthcare ?
De-identified data can be used for analysis, training, and evaluating machine learning models and for sharing with non-privileged parties, while protecting patient privacy.
As we know what is data de-identification and why it is important, let’s do quick demo to understand how the data de-identification of FHIR data can be performed using Google Cloud Platform’s Cloud Healthcare API.
Steps to de-identify the FHIR data :
Step-1 : If you have not created FHIR data store, please follow the steps provided in part-1 of this series.
Step-2 : Let’s create one more Healthcare dataset and FHIR data store to save the de-identified data.
The created dataset will be shown on the Healthcare browser screen
Now, lets create a FHIR data store within de-identified-dataset
Configure the data store
Keep stream resource changes to BigQuery as is, select/ create a pub/sub topic for notifications and Click on create.
You would be able to see the newly created data store under de-identified-dataset.
Step-3 : Let’s create one more BigQuery dataset to store de-identified data.
Step-4 : Let’s go back to the Healthcare browser, select the ‘healthcare-fhir-test’ dataset.
Step-5 : Select De-identify from the actions of data store.
Step-6 : Select appropriate dataset and data store values where the de-identified data should get stored.
Step-7 : On the pop-up screen, click on Append to continue.
Click on de-identify and it will start de-identifying the data.
Step-8 : Let’s check if the de-identification is complete in operations section.
Step-9 : Let’s check the de-identified data by opening the FHIR viewer from the actions of de-identified data store.
Step-10 : Search for patient resource as we only have patient resource in our main data store.
here is the de-identified data, where we can clearly see that Identifiers,Given Name, Family Name is de-identified.
Note : To check how Cloud Healthcare API de-identifies the data click here.
Step-11 : Let’s export the de-identified data into BigQuery using the same export option which we have used in the previous part.
Step-12 : Let’s explore the de-identified dataset and patient table in BigQuery.
Done! We have successfully de-identified FHIR patient resource and exported it to BigQuery dataset.
In the upcoming blogs, we will learn more about HL7V2 messages and DICOM datasets. So stay tuned! Thank you!
References