Proceed With Caution

Using internet data in precision medicine research

Kadija Ferryman, PhD
Data & Society: Points
7 min readApr 25, 2018

--

Image via Daniel Friedman

This post was co-authored by Data & Society Postdoctoral Scholar Kadija Ferryman and Elaine O. Nsoesie, PhD, from the Institute for Health Metrics and Evaluation.

Precision medicine is the integration of genetics with other types of data, including electronic health records and environmental and lifestyle data, to tailor and personalize disease diagnosis and treatment to the individual. Although there has been some success using genetic variations to tailor medication dosing and to treat certain cancers, the field is still emerging, with researchers, patients, clinicians, and others hopeful that the increasing availability of health data, coupled with the advances in data storage and analysis, will lead to more insights about the multiple factors that influence health, disease, and medical care.

Several research efforts — including the National Institutes of Health’s All of Us Research Program, Google’s Project Baseline, New York University’s HUMAN Project — aim to advance precision medicine by collecting multiple forms of data from large numbers of people. While these research projects plan on integrating relatively novel forms of data, such as digital data from wearable devices, there has been less attention given to including data from internet-based applications — such as social media, search queries, forums, and online communities — despite evidence showing that these data can reveal important health information. For example, researchers have shown how social media data can be used to provide signals on mental health, including suicidal ideation and postpartum depression.

Despite the public attention and backlash stemming from the Cambridge Analytica scandal — which began with an academic inquiry and resulted in at least 87 million Facebook profiles being disclosed — researchers argue that Facebook and other social media data can be used to advance knowledge, as long as these data are accessed and used in a responsible way. We argue that data from internet-based applications can be a relevant resource for precision medicine studies, provided that these data are accessed and used with care and caution — especially as new sources of data become available such as through Facebook’s new partnership in the wake of the Cambridge Analytica scandal with the Social Science Research Council to “make data available for the first time to social science researchers via an independent, transparent, peer-review process.”

Image via Marcin Ignac

Improving Understanding

Internet-based applications can be used in precision medicine research in two ways:

  • For patient recruitment and engagement and;
  • For the study of health.

First, internet-based applications can enable targeted recruitment of groups, who may be otherwise difficult to reach. For example, a randomized controlled trial on depression successfully recruited participants using targeted internet advertising. In addition, patient-led communities and groups on social media can facilitate patient engagement in precision medicine research. Online communities exist around specific diseases, such as cystic fibrosis, or health goals, like the sharing of information and data for research. For example, PatientsLikeMe, an online community of nearly half a million people has partnered with pharmaceutical companies and academic organizations for multiple research studies. Information shared within these communities can aid in the identification of treatment-side effects and can influence clinical trial research. For instance, a patient community group on Facebook provided more comprehensive reports on issues related to the use of a medical device when compared to reports submitted by physicians. This patient-generated information can influence the direction of research studies by focusing on aspects of a disease or treatment that researchers may not otherwise consider.

Data from internet-based platforms can also assist in developing a comprehensive picture of a patient’s health. Termed the “digital phenotype,” or the expression of a person’s health state via the use of digital technology, health data from internet-based applications have been shown to provide important insights about health. For example, research suggests web searches can predict the onset of pancreatic cancer and can identify adverse drug reactions. Social media data can also reveal lifestyle choices, potentially contributing to an understanding of how an individual interacts with their environment. For example, with patients’ consent, researchers can access information about physical activity, food choices, and product purchases, typically shared on social media platforms. This information could be used to understand every day choices and their potential association with an individual’s health.

Health data from Internet-based applications can be used to assess health care quality, which can be analyzed with individual-level data. Reviews of health care quality on business review and social media sites (such as Twitter) including instances of discrimination, can provide additional insights into a person’s social and environmental determinants of health. Through the increasing availability of business reviews, discriminatory experiences can be revealed, providing patients the opportunity to shed light on — and avoid future — negative experiences. Discriminatory experiences themselves are known to have negative impacts on health outcomes, and health outcomes can be negatively impacted by poorer quality of care due to discrimination.

Illustration via Gary Zamchick

Proceeding With Caution

Despite these opportunities, we must proceed with caution and consideration for the limitations regarding access, privacy, and representation that disproportionately affect historically disadvantaged and marginalized populations.

Although social media platforms can be useful for recruitment and engagement of study participants, not everyone is connected to social media or connected to relevant groups within these platforms. An example is the difference in public Facebook groups for cystic fibrosis and sickle cell disease. Although there are approximately 30,000 people with cystic fibrosis and about 100,000 people with sickle cell disease in the United States, the largest public cystic fibrosis group has over 25,000 members whereas the largest public sickle cell group has about 11,000 members (information accessed on 07/13/2017). These disparities are significant, especially if replicated across groups focused on diseases that largely affect minority or underserved populations. The Cambridge Analytica scandal also highlights the ethical concern regarding recruitment of participants using internet-based applications, such as recruitment within a participant’s network, discussions of involvement, and results from research studies online.

The value of health information gleaned from social media might differ across communities. Certain communities, such as African Americans and Native Americans, have experienced a history of abuse at the hands of medical researchers and institutions. Therefore, it would be unsurprising if individuals in these groups chose not to or were more reticent about sharing personal health information on their accounts or within social media groups compared with others. For example, a recent study suggested data from Internet-based applications may be less useful for tracking food-borne illness in lower-income communities due to a lower incidence of reporting, despite studies suggesting that food-borne illness might be higher in these communities.

Finally, there is no comprehensive guidance on the use of internet-based applications for researchers and institutional review boards (IRBs). Efforts such as the Connected and Open Research Ethics Program (CORE) , Pervasive Data Ethics for Computational Research (PERVADE) project, and SSRC’s review committees are working to develop this kind of guidance, and there should be more support for this kind of work. Our goal here is to focus on minority and marginalized groups, as the ethical issues we discuss may be concerning for everyone but may have different resonances and impacts on these groups.

Internet-based applications present a potential opportunity for precision medicine, an emerging field that aims to collect, integrate, and analyze multiple forms of health data. But there must be careful consideration of the challenges of integrating this kind of data. Researchers should measuredly approach the use of social media for research recruitment and engagement in precision medicine, or, in other words, proceed with caution.

For more on these themes, check out Fairness in Precision Medicine and What is Precision Medicine? by Data & Society Postdoctoral Scholar Kadija Ferryman and Researcher Mikaela Pitcan.

Kadija Ferryman is a Postdoctoral Scholar at Data & Society and is the author of “Fairness in Precision Medicine” the first report to deeply examine the potential for biased and discriminatory outcomes in the emerging field of “precision medicine,” or “the effort to collect, integrate, and analyze multiple sources of data in order to develop individualized insights about health and disease.” Her research examines the challenges and opportunities of using big data technologies in medical research and healthcare. She earned a BA in Anthropology from Yale and a PhD in Anthropology from the New School for Social Research, where she studied the ethical impacts of genomics research. You can find her on Twitter @KadijaFerryman.

Elaine O. Nsoesie is an Assistant Professor at the Institute for Health Metrics and Evaluation and the Department of Global Health at the University of Washington in Seattle. Her work is primarily in the field of digital epidemiology and global health. She evaluates the use of digital data and technology for public health surveillance. She has also written about digital data and health for The Conversation, Quartz, and NPR. You can find her on Twitter @ensoesie.

--

--

Kadija Ferryman, PhD
Data & Society: Points

Anthropologist focused on health, ethics, equity, tech, and information. Postdoctoral Scholar @datasociety + @urbaninstitute alum.