Why Big Data is a Big Deal for Health
Q&A with Atul Butte, MD, PhD
As told to Patricia Meagher
Butte wears many hats at UCSF: professor of pediatrics, director of the UCSF Bakar Computational Health Sciences Institute, and the Priscilla Chan and Mark Zuckerberg Distinguished Professor. He is also the chief data scientist for UC Health, the infrastructure uniting all six UC medical centers: UCSF, UC Davis, UC Irvine, UCLA, UC Riverside, and UC San Diego. Butte and his team are harnessing the collective power of UC’s systemwide biomedical data — which he sees as a first step in building a massive global database that will someday enable precise, targeted, accountable care in California and around the world.
What kind of data are we talking about?
Some data is easy to get at — like electronic medical records [EMRs], lab results, admissions notes. Some is more scattered — like DNA samples, gene expression data, clinical trial records. We’ve invested millions of dollars in EMR systems and cool gizmos like wearable medical devices to generate more and more data. Much of it is accessible with the right governance and permissions, and it’s more than enough to make a difference. But we’re really not doing much with it at this point. Data by itself does nothing. We have to turn it into knowledge to effect changes in policy and behavior. All this data is just sitting there, waiting for us to ask the right questions.
How do we make sense of it?
We need to train people to ask, “What can I do with it?” Biomedical big data is, by definition, big, raw, and messy. The more we have, the more amazing it is. But the hard part is figuring out what to do with it. The solution is to educate — and inspire — more data scientists, people trained in biomedical and computer sciences and statistics. Companies offer high salaries to snap up these folks, so it takes dedication for them to stay in academia. We might need to start training and recruiting even earlier, in high school.
What kinds of problems can you solve with big data?
Say you’re researching a treatment for liver cancer. You could start with millions of chemicals and petri dishes full of cells and eventually get a drug into a clinical trial, which costs a billion dollars and takes 15 years. But start with the data instead, and a dedicated researcher can launch a data-driven experiment for just $50,000. In fact, ICHS researcher Bin Chen did exactly that for hepatocellular carcinoma. [See the image above.] It’s known as drug repositioning: We take data on tens of thousands of drugs — some already approved for human use — and match them with gene expression data on a given disease, looking for drugs made for another purpose that can affect this disease. It’s like Match.com for drugs. Data can also lead us to other solutions, like designing a more specific blood test, or eliminating unnecessary blood transfusions, or creating maps of disease and death that show us how a disease will behave over time.
Where is all this leading us?
If we want to change the world, we need to do something with our data and discoveries; we can’t just keep writing papers. For example, the ICHS is working with all six UC medical centers to aggregate 15 million patient records — strictly regulated to safeguard privacy — into one safe, secure, reliable repository. There’s no other set of academic medical systems in the U.S. with as much patient data and as much computing power to analyze it as UC has. This is where we need to start if we are going to get all our data in one place. Not just UCSF’s data, not just UC’s data, but everyone’s data — what care we provided, what worked, what didn’t work. We can then predict what will happen with any given patient or any given disease in any environment over the next 90 days or year or 10 years. We’ve got to get there so we can provide truly customized, precise, and accountable medical care for everyone.