Pre-processing Module for Digital Biomarker Development

Geetika Singh
Digital Biomarker Discovery
4 min readMar 1, 2021

Wearable devices enable continuous monitoring of physiological signals, which can be used for predictive, diagnosis, and preventive purposes. More of us have started using these devices to monitor our health. The prediction that 1105 million people worldwide will use wearable devices by 2022 [1], highlights the increasingly prevalent usage of wearable devices.

One of the goals of digital health research is to develop digital biomarkers. Digital biomarkers are defined as digitally collected data from devices like wearables and implantables that can be used as indicators of health outcomes. You can learn more about digital biomarkers on the BIG IDEAS Lab’s DBDP introductory article. Digital biomarker development requires in-depth domain knowledge and computing skills [2]. In order to facilitate digital biomarker research, we have developed the Digital Biomarker Discovery Pipeline (DBDP).

In a recent survey of digital medicine professionals [3], comma-separated values (.csv) file format was found to be one of the most common file formats used in digital medicine research. The survey results revealed that the top three reported devices or sensors include smartphones (iPhone or Android), Apple Watch, and Fitbit [shown in Figure 1].

Figure 1 : Sensors used by researchers in digital medicine. [3] (A) Types of devices and sensors reported by researchers in the survey. (B) Sensors associated with physiological measurements

Before the data is analyzed or used as an input to one of the DBDP modules, it has to be processed to obtain a standard data structure. This includes converting values to store them as a standard data type (For example: python datetime object, int, strings, etc.). To mitigate the challenge of obtaining the data in the right structure in minimal time, we have developed a pre-processing module as a precursor to other modules of the DBDP. The module aims to transform raw data (example: .tcx file) into an accessible format for research. The goal is to enable a researcher to upload their raw files from the selected wearable device to convert the raw format to a pandas dataframe (Python), R dataframe, and/or a .csv file.

The module modifies column names (example: remove spaces from column names) and converts the data into a uniform data type (example: standard datetime objects, integers, etc.). It also calculates time elapsed since the start of data collection which can be used for time-based analysis (progression). For wearables with multiple comma-separated files that give data from each sensor separately (example: Biovotion), the module combines all the physiological parameters into a single data structure for easy analysis. The module also has the ability to remove multiple lines of headers (the information, if used directly will not be placed correctly in a dataframe and hinder the exploratory data analysis and modeling processes) and convert the file into a dataframe that can be analyzed directly (as a set of attributes and tuples). It adds a column with watch name for easy identification of respective watch data for detailed analysis.

The current version of the preprocessing module includes devices from Apple, Fitbit, Garmin, Miband, Biovotion, Empatica and .EDF files from common electrocardiogram patches [shown in Figure 2]. We are in the process of adding more devices to our pre-processing module. Let us know in the comments which devices you would like to see represented!

Figure 2 : Devices included in the DBDP Pre-processing Module as of Feb ‘21.

Using the preprocessing module is simple and can be used as a standalone tool or alongside other modules of the DBDP. This is our attempt to make the tedious pre-processing task easier for researchers.

All you need to do is clone the GitHub repository and get started. Please find the details of the code in the README file and refer to our User Guide to get started.

Continued Development

We are frequently updating the module with new devices and insights from the DBDP. We are planning to ingest wearable device data directly using APIs and also provide a tool to de-identify data. It can be used by researchers who wish to publish or open-source their data.

If you are interested in collaborating and/or contributing to the DBDP pre-processing module, please contact us at dbdp.org. If you or your organization would be interested in a collaboration to get your wearable device on the DBDP pipeline, we would love to hear from you!

Resources

  1. Connected wearable devices worldwide 2016–2022. Published by Statista Research Department, Jan 22, 2021 https://www.statista.com/statistics/487291/global-connected-wearable-devices/#:~:text=Connected%20wearable%20devices%20worldwide%202016%2D2022&text=The%20number%20of%20connected%20wearable,than%20one%20billion%20by%202022.
  2. Bent, B., Wang, K., Grzesiak, E., Jiang, C., Qi, Y., Jiang, Y., Cho, P., Zingler, K., Ogbeide, F.I., Zhao, A., Runge, R., Sim, I., Dunn, J. (2020). The Digital Biomarker Discovery Pipeline: An open source software platform for the development of digital biomarkers using mHealth and wearables data. Journal of Clinical and Translational Science, 1–28. doi:10.1017/cts.2020.511
  3. Bent B, Sim I, Dunn J, Digital Medicine Community Perspectives and Challenges: Survey Study, JMIR Mhealth Uhealth 2021;9(2):e24570, URL: https://mhealth.jmir.org/2021/2/e24570, DOI: 10.2196/24570

--

--

Geetika Singh
Digital Biomarker Discovery

Biomedical Data Manager II at PathAI | Ex-Neuroscience Data Engineer at DataJoint | Biomedical engineer | Duke University | BIG IDEAS Lab | Health Data Science