[GSoC][LibreHealth] FHIR Analytics Using Spark SQL
Finally I was able to execute Spark SQL on top of the FHIR data which store in Cassandra database. It’s a good learning curve to achieve this target which involved in learning different type of technologies such as Spark, Spark SQL, Cassandra, Bunsen, Spark Cassandra Connectors and Spring. Following diagram shows the overall architecture of this module.
As shown in the above architecture diagram, module is loading data to spark from Cassandra via Cassandra Spark Connector which provided by the datastax. Then this data will be converted to spark data frames using Bunsen library. After that it allow users to query the data via Spark SQL. LibreHealth Analytic UI provide capability to execute Spark SQL against the FHIR data loaded into the Spark. Also this module provide UIs for key resources like patient, observation and encounter for easily query these resources via key attributes of each resource.
Bunsen is converting the FHIR resource to data frame which is compatible with Spark SQL. For example, here is the table view in Spark after converting FHIR Patient resource to data frame.
FHIR Analytic Using Spark SQL
LibreHealth FHIR Analytic Module Provide UI to execute Spark SQL against the data available in the system. For example if someone want’s to get the patients which have observations valueQuantity greater than 15. Then following Spark SQL will get the required data.
SELECT patient.id, observation.id, observation.subject, observation.valueQuantity FROM patient inner join observation where observation.subject.reference == patient.id and observation.valueQuantity.value > 15
Likewise users can execute complex queries against the FHIR data available in the Cassandra.
FHIR Resource Base Analytics
Someone might find difficulties in writing Spark SQL queries. Hence this module provide simply UI for perform analytics on key FHIR Resources such as Patient.
As shown in above image (need to add good looking styles!), user want’s to find patients which have patient identifiers having system with ‘oid’. If user wants to do a exact match, then ‘Contains’ checkbox doesn’t need to be checked. If user checked ‘Contains’ checkbox, it will query patients who are having ‘oid’ term anywhere in the system value.
By default module executes combined search where all the attributes combined with AND. For example, if user filled family name and identifier system, then search will gives results which both matches family name and identifier system. If user wants all the data which either contain given family name or identifier system then he can checked the ‘Individual’ checkbox.
After filling all necessary fields, user can obtain the query results in the bottom of the page. Now I’m currently working providing full support for patient search while making the user interface more cleaner.