Using econometrics to close the medical research gap

Dan Zou
tech-protenus
Published in
4 min readNov 1, 2017

--

In healthcare, the possibilities presented by the proliferation of electronic data are endless: faster and more accurate disease recognition, improved treatment, and lower costs.

Markets see the potential, too. Investors are putting record-breaking dollars into digital health companies aiming to gain clinical insights from electronic health record (EHR) data, in applications ranging from pharmaceutical research to cancer diagnosis.

The insights to be gained from this new trove of data, however, have yet to be unlocked. Lack of deep and large-scale data, which until recently was a defining feature of the healthcare industry, is no longer the challenge. In healthcare, that challenge is the ability to draw causal relationships from reams of unstructured data contained inside EHRs.

Enter econometrics. It’s a powerhouse subfield of statistics that looks at a data set and asks, “What caused an outcome to change?” At its core, econometrics is about data richness: using as much data as you have at your disposal, in order to be as precise as possible about the nature of a relationship between data points.

And, econometrics used together with machine learning, which asks, “What will an outcome be?” and is dedicated to accurate prediction — the potential for dramatic improvement in healthcare is boundless.

A practical example is clinical research. Currently, clinical research largely uses two kinds of data: small-scale clinical trials or large-scale correlations with very shallow population-wide information.

Because the data sets in these cases are limited, the conclusions drawn from these studies are typically about average effects, i.e. whether a new treatment was on average helpful or harmful when the data was collected. Additional demographic information is sometimes available that can further narrow down that a treatment is more effective for a certain age group, for example.

But for a unique person, with his or her own medical history, to try to figure out whether a new treatment is a good idea, the information available is thin. In some cases, the science is overwhelmingly convincing; in many cases, it’s not.

What if small-scale clinical studies and large-scale population-wide data sets could be supplemented with a third type of data: Data on conditions, outcomes, and related causes, found in electronic health records, with insights brought to the surface using econometric analysis?

In other industries, which are decades ahead in data collection and analysis, econometrics has dramatically changed the research infrastructure. In marketing research, the widespread availability of data has shifted the industry from focus groups to a world where Target is targeting pregnant women with baby-related ads based on information inferred from a couple of purchases common to early pregnancy, and MBA students at the University of Chicago and other MBA programs are moving beyond more qualitative marketing courses and instead learning about panel data, regression techniques, and experimental design from marketing PhDs like me.

In actuality, econometrics is already at the heart of clinical research: figuring out what caused an outcome to change is at the core of any treatment study. Does a new cancer treatment actually cause higher remission rates? Will a new drug to control heart problems really reduce the incidence of heart attacks?

Imagine using all the data available in electronic health records to find new insights as deep as those we discover from small-scale clinical research. Where econometrics excels is to help us draw conclusions from rich, poorly structured data by using all the tools in the toolbox.

The technology required to gain actionable insights from this new trove of EHR data is gaining in maturity as well. At Protenus, for example, we use artificial intelligence to understand EHR data in all of its variable forms, integrating hundreds of data sources from inside a healthcare organization.

We use those insights to help hospitals, insurers and health information exchanges protect patient privacy, prevent drug diversion and generally ensure compliance and integrity across the entire healthcare enterprise, but the same is possible in clinical research. Data integration is the first step to being able to find causal relationships between factors, relying on the expertise of current clinical researchers to use the data and pave the way.

Because econometrics focuses on teasing apart causation from mere correlation, and is able to work with data of varying depth and breadth, it deserves to be among the standard tools of choice in the next generation of clinical research — research that includes the context provided by EHR data currently proliferating across healthcare organizations. It’s just one example of the endless possibility that data provides.

--

--