Healthcare’s Big Data Problem

Why the “Big Data Revolution” hasn’t reached healthcare — and what we can do about it

Tyler Beauchamp
8 min readOct 11, 2015

“Big data” has become an incredibly hot topic in nearly every industry. Its definition varies widely depending on who you ask, but it generally refers to the valuable insights that can be gained from analyzing vast amounts of information. The term gained popularity in the mid-1990s, when advancements in computer storage and processing power made it possible to analyze large, detailed data sets in a matter of minutes rather than days. During this time, technology companies began integrating big data into their business models. For example, shortly after Amazon.com’s founding in 1994, the online retail giant began using vast amounts of data collected from its consumers’ purchasing behavior to intelligently predict product demand on a grand scale. Big data now allows Amazon to offer personalized product recommendations to consumers, and it helps the company sell up to 27 million items per day (306 items per second). Amazon is so confident in its algorithms’ predicting power that it has even patented a concept called “anticipatory shipping.” To reduce shipping times, the company predicts what product a consumer is likely to buy, and ships it toward a distribution center near their home, days before the “add to cart” button is ever clicked.

Amazon’s patent filing for “Anticipatory Shipping”

It may seem strange that big data has been used with such success by an online retailer for over two decades, but has been largely ignored by American healthcare, an industry that has relied heavily on data-driven innovations for centuries. But many health industry leaders think that the time may be ripe for big data to start transforming healthcare. Philip Bourne, Associate Director for Data Science at the National Institutes of Health believes that big data is “going to become the dominant driver in what happens in healthcare.” Recent attention-grabbing headlines on the topic include “How Big Data is Changing Healthcare Forever” and “Can Big Data Cure Cancer?

The prospect of a ‘big data revolution’ in healthcare is certainly exciting, but I think that many have simply gotten caught up in the hype; blanket statements about how big data could soon lead to a utopian-like healthcare system may be overly optimistic. Healthcare industry leaders rarely mention the structural challenges in healthcare that must be solved before such a revolution would be possible. While big data has already shown incredible potential, the American healthcare system is not ready to fully benefit from the big data revolution quite yet.

There are many obstacles preventing the widespread utilization of big data technology in healthcare, but two of the main problems involve how health data is collected. In order to take advantage of big data, data must meet two very basic criteria. First, it must be well-organized and recorded consistently; second, it has to be accessible for analysis and interpretation. While these may sound like very simple requirements, health data doesn’t always meet them.

Vital signs like body temperature, respiration, blood pressure, and heart rate are some of the pieces of data normally recorded during every hospital visit, but these metrics are not good indicators of overall health on their own. Some hospitals or physicians may consistently collect additional health data, but there are many differences between hospitals and between physicians in terms of what data is deemed important to collect, and how that data is collected. This can make it incredibly difficult to extract meaning from health data on a large scale.

Proper recording of health data is only half the battle; in order to draw insights from the data, it must be extracted, processed, and analyzed. Health data is incredibly diverse; it includes simple numeric readings like vital signs that can be stored, accessed, and analyzed without much trouble, but it can also include unstructured forms of data that are not as easily analyzed, like photographs, MRI scans, physician notes, or (more recently) genetic information. In order to manage all of this data, every hospital in the United States uses a unique combination of electronic medical record systems and practice management software packages. These varied data management systems can make it difficult for physicians, hospitals, or public health organizations to make use of the health data they contain.

Fortunately, new technologies are helping to solve some of the challenges associated with big data in healthcare. Electronic health records (EHRs) have already helped hospitals achieve more thorough data recording, and health information exchanges (HIEs) allow patients, hospitals, providers, and insurance companies to securely share electronic health records with one another. Together, these two technologies make up a solid foundation upon which big data technologies can be built, but the high cost of implementing them has limited their growth. Fortunately, the Affordable Care Act offers incentives to providers who implement them, which may help offset costs.

But even if EHRs and HIEs were used by every hospital in America, big data could still not be leveraged to its full potential. Stringent government regulations outlined by HIPAA greatly limit access to health data. There are many private data analytics companies that have sought access to American health data, and nearly all have been denied. But in 2013, after extensive negotiations with the U.S. Department of Health and Human Services, Mayo Clinic was granted permission to share the detailed (yet anonymized) clinical health data of approximately five million patients with Optum Labs, a technology company that uses population analytics algorithms to deliver useful medical and logistics advice to hospitals. This partnership has resulted in the largest accessible source of detailed patient health data in the country.

Optum Labs applies its population analytics algorithms to anonymized patient data in order to gain valuable medical insights. Source: https://www.optum.com/optumlabs

While improbable in the near future, assume for a moment that all of the legal and logistical challenges associated with the use of big data in healthcare are solved, and that anonymized, detailed clinical health data from hundreds of millions of patients is now available to trusted companies and organizations. What useful insights could this amount of data provide?

Public health organizations could utilize large amounts of data to solve common health issues. Some of the biggest public health problems in the United States, like high blood pressure and diabetes, are difficult to treat, but are relatively easy to prevent through public health initiatives. By analyzing the past health records of millions of Americans who suffer from these health problems, public health groups could identify early warning signs and inform physicians what to look out for, so that they can take appropriate actions. These at-risk populations could then be tracked after interventions are taken to see which ones are the most effective.

Insights from big data could be incredibly useful for assessing the performance of hospitals and physicians. Since every patient’s health history includes information about what entities were involved in care, it would not be difficult for regulatory agencies to assess the quality of specific hospitals, or for hospitals to assess the performance of specific physicians. Insight into how hospitals or physicians are performing could help identify areas where care needs to be improved.

Big data could also transform America’s multi-billion dollar pharmaceutical industry by facilitating large-scale, low-cost studies that give valuable feedback on drug development. Currently, the safety and efficacy of prescription drugs is assessed in the third phase of the FDA’s clinical drug approval process. To identify how the drug affects the health of the study’s participants, researchers record detailed health data from participants throughout this phase and often long after. But since these trials involve a relatively small number of participants (usually only a few thousand), some side effects and potential drug interactions will almost certainly be missed. Of course, drugs are also closely monitored after they go to market, but rare side effects can cause a great deal of harm before they are detected. For example, after nearly 20 years and millions of filled prescriptions, Viagra is now being blamed for causing temporary blindness in a small number of patients. If detailed health data from every prescribed patient were collected and analyzed, any harmful effects like this could be identified by big data algorithms before the drug does too much harm. Drugs can also have unintended positive side effects. Although Viagra was originally developed as a cardiovascular drug, its unintended side effect made it one of the most prescribed drugs in the world. Perhaps big data analysis could help reveal less apparent positive side effects of other medications in the future.

But as promising as the benefits of big data sound, the insights it provides should be approached with caution. While much can be learned from analyzing the health of populations, it is important to remember not to accept big data insights as an excuse to take a generalist approach to medicine. Failing to take into account the uniqueness of individual patients can result in treatments that are unnecessary, ineffective, or even harmful. There will certainly be instances when big data analysis leads to incorrect advice, and instances when this advice is blindly followed. Healthcare providers and companies should be advised to consider their own expertise and intuition before putting too much trust in software-generated suggestions.

Electronic health records and health information exchanges will play important roles in increasing the body of data that can be used for big data analysis in healthcare, and recent partnerships between providers and data analytics companies are already showing the new insights that big data analysis could provide. But current health data restrictions are limiting the number and depth of these insights. In order for big data to truly “revolutionize” healthcare, large amounts of anonymized health data will need to be made more accessible to companies or organizations that have the power to analyze it. This could be done through more negotiations between providers and private companies (like between Mayo Clinic and Optum Labs), but these negotiations would require more government approval, which could take years to receive.

A better option would be to relax HIPAA’s stringent health data regulations without weakening necessary privacy protections. This will certainly be challenging, but surely the law can change to support both progress and privacy in healthcare. Admittedly, giving more organizations access to private health data could lead to data breaches, but I believe that big data’s benefits are worth this risk. Big data has already shown the potential to deliver incredibly valuable insights that could drive research, save lives, and improve care quality for millions of Americans; it is time for restrictive legislation to change so that big data can be used to its full potential in healthcare.

Thanks for reading! If you’d like to see what else I’m up to, get in touch or check out my website.

--

--

Tyler Beauchamp

UX & visual designer. I like to write about design, science, technology, and politics.