FHIR Data Ingestion using GCP’s Cloud Healthcare API(Part -1)
In this article we would be exploring how FHIR data ingestion and analytics can be performed on GCP using Healthcare API and BigQuery.
Part 1 will cover how FHIR data can be ingested into FHIR store on GCP and analysed using BigQuery’s analytics Capabilities.
Part 2 will cover how we can perform data de-identification operation on FHIR.
Part 3 will cover how DICOM datasets can be ingested into GCP and de-identified later.
For those who are new to FHIR(Fast Healthcare Interoperability Resources), it is a data standard which is followed by Healthcare industries to ease the interoperability of healthcare data. Learn more about FHIR.
[Looking for more information about FHIR? Read Care Gaps on FHIR to improve quality of patient care.]
Topics which would be covered in this article are as follows :
- Cloud Healthcare API
- Ingestion of FHIR data into GCPs FHIR store
- Export FHIR data to BigQuery
What is Cloud Healthcare API ?
Healthcare API is a mediator between the healthcare systems and applications built on Google Cloud Platform. Using healthcare API one can connect their data to advanced Google Cloud Capabilities, including streaming data processing with Cloud Dataflow, scalable analytics with BigQuery, and machine learning with Cloud Machine Learning Engine.
Google Cloud provides detailed guidance regarding how it supports compliance with HIPAA in the US, the PIPEDA in Canada, and other global privacy standards at cloud.google.com/security/compliance.
The Cloud Healthcare API treats data location as a core component of the API. You have the option to select the storage location for each dataset from a list of currently available locations which correspond to distinct geographic areas aligned with Google Cloud’s regional structure.
Let’s deep dive into a quick demo which will cover FHIR data ingestion into Google Cloud Platform using Cloud Healthcare API. For this demo we are using FHIR data which is stored on GCS bucket in .ndjson(newline — delimited) format.
Steps to ingest FHIR data into GCP
- Enable the Healthcare API — To Enable the Healthcare API, follow the link and click Enable.
2. Create a BigQuery dataset to store the exported data for analysis. Go to the BigQuery Console and create the dataset.
3. Execute the below mentioned commands on cloud shell to attach the right IAM roles to healthcare api’s service account.
export PROJECT_ID=$(gcloud config list --format 'value(core.project)')
export PROJECT_NUMBER=$(gcloud projects list --filter=projectId:$PROJECT_ID \
--format="value(projectNumber)")
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
--role=roles/bigquery.dataEditor
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
--role=roles/bigquery.jobUser
4. Create healthcare dataset by searching Healthcare in the navigation menu and click on Create Dataset.
set the below mentioned properties for healthcare dataset
5. It will take sometime for the dataset creation, once it is created you would be able to see the dataset in the healthcare browser as shown below
6. Click on the recently created dataset to create a FHIR store.
set the below mentioned properties for data store
select the FHIR store version and click next
make no changes in BigQuery stream section and move forward to Receive Cloud Pub/Sub notifications section, select Create A Topic option.
provide topic id, encryption mechanism and click on create topic
Click on create to create a FHIR data store.
7. Import FHIR data from Google Cloud Storage bucket to the recently created FHIR data store.
8. Before importing the dataset, provide object viewer permission to healthcare service account by running the following command in cloud shell
gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com --role=roles/storage.objectViewer
9. Now select the appropriate project,GCS location and Content Structure and create import.
10. Go to operations to check the logs and process.
here you can see all the resources were successfully imported
11. Select Open in FHIR viewer from the data store window to check the recently imported data
12. As part of this demo, we have inserted sample patient resource data, hence search for the patient resource.
13. Patient FHIR data would be available on the FHIR store explorer.
14. Apart from this you can select individual record and see its data in form of elements as well as raw json on the right hand side (inside FHIR viewer as shown in above mentioned step).
- Elements
- Json
15. Export the data to BigQuery for analysis using export option on Data Store page
provide export options and click export.
16. Similar to the import operation, if we want to check the logs of this operation we can check the operations window and select the required operation(in this case export one).
As we have exported the Patient resource data, it will create a table with name Patient in specified BigQuery dataset.
That’s it! You’re ready to perform the analysis in BigQuery. Thanks for reading the blog, see you next time!
References