How do researchers use big data — and how do they protect a patient’s data?

According to Ursula Rogers, ‘informatics analyst’ is a fancy title that boils down to one thing: she works with a lot of data.

“My job is to know where data come from, how to extract and export data, and how to bring the data and analyses together in a meaningful way.”

Originally a software developer with IBM, GE, and the UN, Ursula was inspired to shift her career focus towards healthcare data after the healthcare system’s inefficiencies affected someone very important to her: her daughter.

“When my daughter was born, she had a lot of trouble and had to stay at Duke Children’s ICU for three months. For the next four years, we had nurses in our home, along with ventilators and other equipment to keep our daughter alive while she went through a series of major airway surgeries.”

The family caught errors in medical record-keeping and were sometimes unable to share records from one provider to another. Although they felt the providers were engaged and competent, the family was fighting not only illness, but inefficiency. “When life-and-death decisions have to be made, you want your provider to have accurate and complete information.”

“My husband and I had so many frustrations with the system because of those experiences. We both worked in software already, and we became passionate about applying our knowledge to health care.”

Ursula’s husband transferred to Watson Health, an IBM initiative looking to revolutionize healthcare through artificial intelligence, and Ursula landed at Duke, first in the Duke Cancer Institute and now in the Clinical and Translational Science Institute.

At the Duke Cancer Institute, she built databases for small clinical trials for several hundred participants. Now, most of her time is spent working with electronic health record data for the Southeastern Diabetes Initiative (SEDI), a 350,000+ person database.

The database includes the ‘clinical story’ of every patient in Durham who has interacted with the Duke Health system over the past 10 years — a complete picture of their electronic health record (EHR). The story includes their diagnoses, encounters with the clinic or emergency room, vital signs for each visit, and any treatments or interventions they’ve undergone.

“The difference between small trial data and SEDI is that small trials only gather the data they need to answer specific questions. With SEDI, we’re gathering large amounts of longitudinal data, and the specific questions come later. Having so much data is very valuable to researchers not because we need all of it, but because the more we have, the more we can drill down into it and ask very specific questions, look at trends over time, and get answers that are accurate for large populations.”

Although the data are often used to study patients who have diabetes, the data help researchers contend with questions about many other issues, such as medication adherence. Ursula said, “I’ve worked on research projects that are asking, ‘When people are prescribed a medicine, do they take it? Does their economic or geographic status predict whether they’ll take it?’ Another researcher is looking at the relationship between diabetes and mental health, so it’s a versatile population health data resource.”

Although the dataset is huge and available to Duke investigators, it’s not something they can simply log into and browse. First, they must have a specific research project already designed that will make effective and safe use of patients’ data. Then, they are required to complete Duke’s Internal Review Board (IRB) process to get approval to view and study the protected information.

“Everyone on the project has to be approved before they can touch the data — the investigators, the statisticians, the other informaticists. The process of developing a question, identifying data needs, and submitting to the IRB (sometimes more than once, with adjustments) can be a long but important one.“

Ursula identifies and prepares the data needed for each project and then extracts the appropriate data from the database. Statisticians take the data and run complicated statistical methods or predictive models to understand what’s currently happening in the data — and even more importantly, what might happen next.

Ursula’s doing exactly what she jumped into health care to do — making sure the data researchers are using to make decisions are accurate and complete: “My job is to dig into the nuances and make sure the data are clean and accurate. We’re handing this data off to researchers who are going to turn around and publish it for doctors treating real people each day. So it has to be right.”