Using MuleSoft to Implement a Decentralized Clinical Trial

Integration with wearables and an IoT platform: the new normal in Clinical Research Organizations.

Anandasankar Joardar
Another Integration Blog
8 min readOct 10, 2022

--

Introduction:

Clinical trial/research organizations (CRO or CTO) have many challenges in completing trials on time and engaging patients throughout the trial process. These issues were magnified during the pandemic and most CROs realized that their usual trial approaches needed to be improved. It often take decades to complete the cycle from discovery to market release for a new drug or therapy; however, pandemic-related drugs and therapies needed to be available for patients in a much shorter time. Now, CROs are reimagining a faster, patient-centric trial process to keep up with demand. In this blog, I will discuss the decentralized clinical trial process and highlight key considerations in MuleSoft design to extract patient data and uploaded to a data lake.

Decentralized Clinical Trial

A decentralized clinical trial aims to bring the trial process closer to the patients/participants. Traditionally, the trial process requires patients to periodically travel to trial facilities; however, frequent travel requirements and close contact during the trial process can become a burden to patients and cause discomfort (especially during and after the pandemic). If there is too little patient engagement or a high drop out rate, the overall reliability of the trial results can become questionable. Requiring patients to come to trial facilities also limits the diversity of the patient population as only people within close proximity to the trial will be able to partake.

On the other hand, a decentralized clinical trial allows for more flexibility for the patients. First, patients can be remotely monitored from any locations, allowing for a more diverse participant pool. Second, trial drugs and therapies can sent directly to the patients’ homes. These elements make it possible for a 100% remote trial, if needed, or a hybrid option where patients can choose a local trial location (for example, taking a trial drug from a nearby pharmacy). This creates a better patient experience and increases the likelihood that a participant stays until the end of the trial. Decentralized clinical trials also expedite the overall trail process because it is more patient centric and creates more insight through larger diversity of data sets.

Technology is the backbone of the decentralized clinical trial process. A flexible, scalable, and resilient platform like MuleSoft is a great tool to achieve this.

Monitoring Using MuleSoft

Now, let’s now focus on the technical architecture of a decentralized clinical trial. Specifically, I will describe continuous, remote monitoring of patients. The overall process is established through layered architecture involving different platform and applications. The decentralized clinical trial will involve the following:

  1. Patient vitals are continuously captured through wearable devices
  2. Devices are registered to an IoT platform and are sending real-time updates on patient biometrics to the IoT platform
  3. MuleSoft is used to create scheduled applications that will periodically extract patient vitals from the IoT platform by invoking the REST API exposed by the IoT platform
  4. MuleSoft applications will transfer and load the patient data to a centralized data lake
  5. The data lake will be analyzed to create meaningful insight that researchers can use to make fast and accurate decisions at different stages of the trial process

The following picture demonstrates the overall trial process implementation:

High-level Depiction of the Decentralized Trial Process

The MuleSoft Application can be scheduled using the out-of-the-box (OOTB) MuleSoft Scheduler or a MuleSoft API to schedule through a third party scheduler (refer to this blog for more information). This will invoke the IoT REST API to extract the patient biometrics (most IoT platforms offer REST APIs to interact with the platform).

API led connectivity can be envisioned to establish the decentralized clinical trial as follows:

MuleSoft API Led Connectivity — Loosely Coupled Decentralized Trial Architecture

In the diagram above, the experience layer is omitted since we are not directly creating any experience. Instead, the process layer orchestration will be triggered through a third-party scheduler (or OOTB MuleSoft Scheduler component). The patient experience is created through the wearable devices connected to the IoT platform. This is one of the integration patterns for decentralized clinical trials implemented with MuleSoft to move data from a device to data storage (data lake).

Key Considerations:

Some of the key considerations that need to be considered for such architecture are as follows:

  1. Use MuleSoft Batch scope and Batch Integration patterns when implementing the Process API/application. IoT workload patterns are like a very high volume of messages but individual message sizes are small. MuleSoft Batch scope supports OOTB streaming and is created to handle large amounts of data processing.
  2. Use the aggregated or bulk loading to the data lake to achieve better performance. It is better to use UPSERT instead of INSERT to load into data lake; however, it is also subject to specific processing functionality.
  3. Implement a pagination technique while extracting data by invoking the IoT API. Since any IoT driven implementation means continuous streaming of data, you can avoid overflow by implementing a pagination technique. If the IoT API itself supports offsets and limits to extract data in chunks then that should be leveraged in the MuleSoft application design. Otherwise, MuleSoft needs to implement some criteria to limit the volume of data in each extraction based on the timestamp (i.e., last updated date time). You should also implement a technique to watermark the last extracted message metadata to avoid any data duplication.
  4. A watermark technique to preserve the reference of the last successfully loaded data chunk is required. A table to preserve the watermark information on a third-party database is recommended. Every time the MuleSoft process is kick started by the scheduler, it should retrieve the last successful data extraction reference from the watermark column and start data extraction from the next range.
  5. After a successful data extraction and data upload to the data lake, you should update the watermark table with last data chunk reference. If an error occurs, ensure the watermark table contains the same data chunk reference of last successful data extraction and load (i.e., do not update the watermark table with the data chunk reference under processing if an error occurs at any stage of the process). The re-try should be started from the last successful data chunk.
  6. Here is a high level process overview: once the MuleSoft application is kick started by the scheduler, the process will read the watermark column from the table to retrieve the last successful data chunk reference. Then, the process will start fetching data chunk from the next reference point by calling the IoT REST API. The next step is to bulk UPSERT the data into the data lake. Once the data load is successful, update the watermark column with the data chunk reference that was just processed. If any stage of the process failed, roll back and do not update the water mark column with the new message reference of the data chuck that failed to process.
  7. Any data chuck that failed to be loaded to the data lake needs to be tracked on a custom error table on a third-party database. This does not mean to store the entire payload on a database table, just keep a reference or metadata of the failed data chuck. This will allow the same data chunk to be extracted again from the IoT platform as part of the retry attempt.
  8. It is advisable to re-try for finite amount of time if a failure occurs while extracting or loading the data. Only when all the retry attempts are exhausted and the data is still not successfully loaded should you update the custom error table (with failed data chunk reference) as described in point 5. The operation team can track all the failed data chunks from that table.
  9. There needs to be a provision to manually invoke the extraction and load process. This will allow the operation team to manually extract any data chunk on-demand from IoT and load to the data lake in case of a failed data load. For example, all retries may get exhausted due to a long outage of the down-the-line application. The operation team can manually load that data chunk when the underneath application outage is over. Essentially, the MuleSoft implementation can be triggered manually, on-demand in addition to the regular triggering of the process by the scheduler.
  10. High availability architecture is recommended for MuleSoft application deployment. On CloudHub, deploy MuleSoft applications on at least two workers. MuleSoft will automatically, internally load balance the workload across multiple workers. This will ensure the processing of high volume of data sets.
  11. Ensure Process APIs and System APIs are exposed through separate load balancers (assuming the Process API is scheduled through a third party scheduler). Like on CloudHub, it is recommended to use a dedicated load balancer for the process layer and a separate dedicated load balancer to expose the System API to achieve better scalability. Exposing the API on shared load balancer for this type of integration pattern is not recommended. Only whitelist the Anypoint VPC CIDR range on the dedicated load balancer (firewall) for the System API; make the System API exposed internally to the VPC only and safeguard the underneath data lake from any external attack. Also, use MuleSoft OOTB security policies to further secure the layered architecture .
  12. Be very careful in using custom variables in the Mule flow implementation. Use remove-variable at the end of the process to re-initialize all custom variables used. This will optimize the heap space utilization and avoid any unnecessary out of memory issue during the processing of a high volume of data sets.
Basic processing flow diagram

Conclusion:

Clinical research organizations (CROs) are reimagining the clinical trial process to create a better experience for the patients, expedite the trial process, and make the trial more accurate. A scalable and resilient platform like MuleSoft is essential in the end-to-end architecture for a decentralized trial process. More and more clinical trial organizations are opting for MuleSoft to create a data pipeline between the IoT platform and the centralized data lake to establish a decentralized trial process. Wearable devices are gaining popularity in collecting patient vitals remotely. This blog is an attempt to provide some architectural aspects of MuleSoft in the context of a decentralized clinical trial process. I hope this will help give a high level idea on the architecture of a decentralized clinical trial process and how MuleSoft applications can play a pivotal role in this.

--

--

Anandasankar Joardar
Another Integration Blog

MuleSoft Ambassador and Delivery Champion, YouTuber, Blogger and Speaker, An Integration Architect