Beginner’s Guide to the Digital Health Data Repository (DHDR)
A curated collection of digital health datasets for use with the DBDP and for digital biomarker discovery
As part of the Digital Biomarker Discovery Pipeline (DBDP) developed in the Big Ideas Lab at Duke University, the Digital Health Data Repository (DHDR) shares DBDP’s mission of providing an open-source end-to-end modular framework for digital biomarker discovery. By aggregating multiple categories (Cardiovascular, Activity, Mental Health, Nutrition, etc.) of existing digital health datasets, the DHDR provides example datasets with which to test DBDP modules. The DHDR can effectively provide an understanding of how the methods work before applying them to novel data!
Note that some datasets featured in the DHDR are published open-access datasets, while others are from credentialed access platforms such as PhysioNet and the UCI Machine Learning Repository. The DBDP team is hard at work to incorporate more example datasets into the DHDR. If you know of any datasets we don’t currently have in the DHDR, drop us a comment below and let us know!
tl;dr
Step 1: Choose the data category you are looking for.
Step 2: Read the dataset descriptions and find the one that best fits your research.
Step 3: Access the data source page linked in the README documentation, carefully review the information about the dataset, download the dataset and cite the source properly.
Step 4: Refer to the README for helpful analysis tools, such as DBDP modules, if needed. Don’t forget to properly cite the DBDP if you decide to use its modules.
Navigating through the datasets
The datasets in the DHDR are organized into comprehensive categories and subcategories. The main README provides an overview of the repository and a table of contents. The README that accompanies each dataset serves as a usage guide (see next section for details).
The general structure of the DHDR is illustrated below.
Digital_Health_Data_Repository├── Category│ ├── Dataset│ │ └── README│ └── Subcategory│ └── Dataset│ └── README└── Main README
For example, here is a subset of the datasets in the cardiovascular category:
Cardiovascular├── Dataset_STEP│ └── README.md└── Electrocardiogram\ (ECG) ├── Dataset_European_ST_T │ └── README.md ├── Dataset_MHEALTH │ └── README.md └── Dataset_MIT_BIH_Arrhythmia └── README.md
An exploded view of the README
Each dataset in the DHDR is accompanied by a README file that contains all the necessary information to get started with use of the dataset. A brief walkthrough of each section is provided below.
Name of dataset
The title of the README corresponds to the name of the dataset(s).
Dataset
This section redirects users to the original website of the dataset. Users can visit the link to learn more about and download the dataset.
Description of Dataset
This section provides users a brief introduction to the dataset. This includes, but is not limited to, the data type, sample size, applications, data acquisition device, and acquisition field.
Citing this Dataset
This section provides necessary citations for the users when using the dataset. The citations include the source of the dataset and the literature that the dataset was originally collected for.
Analyzing this Dataset
This section provides helpful tools for users to analyze this dataset, as well as the citation for the DBDP if its modules are used on this dataset.
Previous Studies utilizing Dataset
This section lists studies that have previously utilized the dataset. Users can take advantage of the methodology from the listed studies to process and analyze the dataset.
Additional Usage Information
This section provides users some additional usage notes.
We are growing!
The DBDP is an open-source software platform that welcomes contributions to both the DBDP code base and the DHDR! Please follow our contributions guide and make sure your code is well-documented. For the DHDR, ensure that the dataset descriptions follow the README structure described above. Happy discovering!