Crunching big data for good healthcare policy

The healthcare industry in the United States is enormous. In 2019, U.S. healthcare spending was $3.8 trillion — 17.7 percent of gross domestic product (GDP) and $11,518 per person, according to the Centers for Medicare & Medicaid Services. This industry of patient care also is enormously complex, with the accumulated data of myriad patients, clinicians, support staff, and institutional providers including hospitals.

That’s where data science comes in. A multifaceted “system of systems,” healthcare benefits greatly from the use of data science tools — like analytics, machine learning, causal inference, and network science — to sort through all the data points to tease out trends and make recommendations that inform better policy and patient outcomes.

As an example, I work with administrative claims data to try to answer vital health questions. Whenever you go to the doctor, a claim is sent to the insurer. That claim contains a ton of information — any procedure performed, any diagnosis made, the date of the service, the type of provider, and so forth. Claims data is especially good for aggregate analysis, because there is so much of it — hundreds of millions of claims a year for the millions of people included in the dataset.

Claims data is particularly useful for analyzing policies, clinical interventions, and changes in trends. For instance, in one study, I assessed the impact of the CDC’s U.S. Medical Eligibility Criteria for Contraception Use (MEC), a new contraception guideline for women with chronic health conditions. The guidelines identified 20 medical conditions — such as hypertension, diabetes, epilepsy and HIV — that increase a woman’s risk for an adverse outcome in the event of an unplanned pregnancy. To mitigate that risk, the MEC recommended long-acting, highly effective contraceptive methods (e.g., IUDs and implants). In our study, employing data science methodologies and statistical analysis across a large dataset of 250,000 women, we found an increase in contraceptive use overall, but not in highly effective contraceptive methods across all conditions.

While my primary discipline is industrial engineering, I work closely with researchers in numerous other disciplines, including psychology, computer science, health care policy, and medicine. I strive for my work to be impactful, and each of these collaborations, tapping interdisciplinary expertise, has uniquely supported the content relevance of my work. To achieve this goal, it is imperative to have a clinician on board to incorporate medical knowledge and the day-to-day limitations for a practicing clinician. Beyond merely undertaking data analysis projects, I identify differences or improvements that are both statistically and clinically significant.

I have started to incorporate information from census data, in order to better understand underserved populations, as well as the environment that contributes to their health (or health disparities), which may relate to race, class, gender, or the urban/rural divide. Being able to bring in lots of different publicly available datasets and connect them is important for a fuller understanding of health outcomes.

For example, health disparities and their impact on labor productivity cost the European Union 1.4 percent of its GDP and cost hundreds of billions of dollars in the U.S. annually. Considering the discrepancies in life expectancy, maternal and infant mortality, and access to quality healthcare, to name a few aspects, health inequity is also an ethical problem.

I explore the effects of inequity in the context of healthcare. My research focuses on low-income, disabled, and chronically ill people. I have learned how ZIP Code is one of the best predictors of health, how physician bias leads to lower-quality healthcare, and that medical issues are the primary cause of U.S. bankruptcies. My interests stem from a desire to support vulnerable populations by deciphering the reasons why they are at risk.

As I continue to work in health data science, I am becoming aware of yet another manifestation of inequity: a greater threat to data privacy for underserved populations. Data has become an increasingly valuable commodity, and the volume of data continues to grow. This growth endangers the security of sensitive healthcare data and the privacy of individuals included in these datasets.

Today, I already see the beginnings of a system in which privacy is a luxury good, driven by a gap between those who can pay for it and those who cannot. This divergence leads to poorer populations being susceptible to data misuse, the most commonly known form of which is financial identity theft. However, the idea of identity is evolving to include social and biomedical identities. Thus, identity protection must also consider mainstream technologies, such as those provided by Facebook and 23andMe. Researchers are stewards of people’s data, so we need to make sure we maintain public trust, and continue to offer valuable findings in exchange for public support through data as well as taxpayer-funded research.

A closing thought: As AI tools become a bigger part of healthcare policy, it is important to note that we want them to supplement the clinician and policymaker, providing additional information to enable better decisions. We need to ensure that these data science tools augment human decisions rather than replacing them. We need to remain cognizant of the role and value of human intuition and interactions as we continue to conduct research and advance in this expanding area.

Toyya Pujol, PhD

Assistant Professor, School of Industrial Engineering and Regenstrief Center for Healthcare Engineering

Member, Purdue Engineering Initiative in Data and Engineering Applications

College of Engineering, Purdue University




Pioneering groundbreaking technology, unlocking revolutionary ideas and advancing humankind across the country, planet and universe. Explore how leading educators, thinkers and innovators at the Purdue University College of Engineering are shaping the future — and beyond.

Recommended from Medium

A Comparative Analysis of New York Streets, in Terms of Bicycle Infrastructure

Proportion Estimation and its Discontents

3+1 Tools that can help you create a Literature review — Research Matrix of papers — Scholarcy…

Using Machine Learning and AI to understand Hip Hop songs evolution

Playing around with name generators and crystal language

Data Science & Linear Regression (Part I)

If your algorithm is not providing you right answer , probably the information learnt by it is not enough!

Engineering inventions and the pandemic’s impact on the global economy

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Purdue College of Engineering

Purdue College of Engineering

Known as the “Cradle of Astronauts,” with a long list of pioneers includes Neil Armstrong and Amelia Earhart. Ranked Top 10 nationwide by USNWR.

More from Medium

The Story of 5G

F1 was better in…

Open Data Week 2022: How to ignite the Third Wave of Open Data in New York City

Q&A with MindTrace Founders