Predictive Analytics in Global Health

Gleb Drobkov
GAMMA — Part of BCG X
7 min readMar 30, 2021

By Anni Coden, Gleb Drobkov, Jeremy Ferlic, and Emily Serazin

Health systems in developing countries will encounter great strains as they roll out their Covid-19 vaccine programs. Gravely aware of these strains, the WHO recently warned of an imminent “catastrophic moral failure” if these systems fail to provide their citizens with equal access to the COVID-19 vaccine.[1] As data science practitioners, there is something we can do to help.

Based on our experience, we believe that healthcare organizations (HCOs) with sufficient data resources can work with these health systems to integrate personalized medicine and data-driven interventions into their existing workflows. By embedding predictive insights and recommendations such as risk scores into clinical workflows, these HCOs can empower rural community health workers to help reverse inequality in vaccine roll-out — as well as to provide improved health benefits across the board.

We recently had the opportunity to evaluate the use of data science methods to identify children in developing countries who, if they drop out of contact with community health programs, run a greater risk of missing COVID-19 vaccinations or of acquiring comorbidities such as pneumonia. Our research found that data captured through the mobile health workflows of community workers and clinics can be used to predict which children face the greatest risks.

In the process of identifying and analyzing the data sets for this project, we learned the importance of consistent and comprehensive data collection. Based on our observations after analyzing data from a number of healthcare organizations, we believe that the use of data mining will enable HCOs to create the same level of personalization of patient treatments as is commonly seen in retail applications. Such personalization, when applied to medical care, has the potential to increase the health and well-being of children around the world.

The Goal: Improve Health Outcomes for Children

The stakeholders in this project included healthcare organizations (HCOs) in Africa and Southeast Asia, which serve more than 600,000 children per year across their combined catchment areas. Our goal, given the existing limitations of the fields captured and the years of data available for each organization, was to use a research-based approach to derive as much insight as possible into how children’s health outcomes can be improved.

Our main focus was on the internal data available through the operational systems of these HCOs. We also incorporated externally available data produced by governmental statistical agencies to obtain the rate of urbanization by geographic area.

Project Context

The data was aggregated through a mobile health tool built by Dimagi, our U.S.-based mobile-health partner. The project was initiated by our foundation sponsor, the Bill and Melinda Gates Foundation. The work was also partially funded by BCG through a Social Impact Practice investment.

Data Mining versus Traditional Research

Typical approaches to understanding health care provider-beneficiary interactions involve commissioning time-consuming primary market and ethnographic research. This type of research uses interviews, focus groups, and surveys, and usually requires both a large team of survey administrators and a significant investment of time.

Our goal was to show that a data-mining approach using existing operational records can, in cases such as identifying at-risk children, provide insights more quickly and scale more effectively than a traditional research approach. The initial focus of our effort was immunization, and our models were designed to predict the likelihood that a child would “drop out” or lose contact with the health system before completing her or his full vaccine schedule.[2]

Our hypothesis was that the data captured through day-to-day operations could be used to generate models to inform a health worker’s recommendation and drive a reduction in the clinic’s dropout rate.

Developing a Risk Score-Based Recommendation Tool

To build these models, we used data collected from existing workflows and augmented it with information from national statistical agencies. In doing so, we developed a risk score that can be calculated in real-time and embedded in the tablet-based application used by clinical workers when triaging new patients.

This approach mimics personalization systems data scientists often build in retail or healthcare settings. The main difference is that this model is trained to predict a child’s dropout risk instead of a likelihood that a customer will, for example, purchase a product. With the right automation and data systems, an HCO team could use insights from this model to run a pilot program capable of launching interventions for children flagged as “high-risk,” and to measure the impact of these interventions on a target population as compared to a control group.

During a child’s standard clinical visit, community workers and clinics collect a number of useful data points, including whether the child is under-weight or malnourished for their age group, if they are late for their scheduled vaccine timeline, and how far from the clinic the child lives. Not all children’s health risk factors can be easily spotted by an untrained worker. But a machine-learning algorithm can process the combination of factors and spot historical patterns of risk, which can create a clearer picture of a child’s health journey than a worker can create on their own.

Once the model is trained, it can create scores based on the probability that a child will drop out after his or her next visit. By analyzing dozens of features and comparing them to historical examples of children in similar positions, the model can accurately flag the at-risk children for further motivational interventions, such as giving them a free gift when they return for their next scheduled visit.

Creating Personalized Healthcare Interventions

Embedding personalized insights like these into an existing clinical mobile application can drive significant improvements in child health outcomes. It can also enable resource-strapped HCOs to optimize how they allocate staff and equipment to the areas of highest need. As a next step, we would like to see the predictive model A/B tested in a real-life setting to confirm how large an impact these insights might have.

Lessons for Going Forward

Our analyses over the course of this project showed that by embedding predictive recommendations into their workflows, HCOs can drive valuable improvements in health outcomes.

To that end, we would like to share three lessons with data science practitioners embarking on a similar path to improve health outcomes:

1. Secure consent early: First, make sure to acquire the consent of patients or their caregiver to allow the data to be used in this way. And do not assume that you have access to the data until you have secured written approval and internal support to receive the proper credentials.

2. Start small and build up fast: Even if a data set has a small number of records or features, a data scientist can still use it to help identify high-level trends and their drivers.

→ Identify a clear metric (in our case, dropout) and generate analyses to investigate correlations in the data and map the most influential factors affecting this metric.

→ If the data asset is large and contains enough historical breadth to split into training and testing sets, then the data scientist can go further and build a predictive model to generate a personalized score for each patient.

3. Do not forget to distinguish between predictability and causality: Many of the patterns we observe in data may be confounded by unseen factors. Since this type of analysis has the potential to impact the lives of thousands of children, it is very important to confirm with the frontline health workers providing care that the suggested insights are, in fact, actionable.

With Opportunities Come Challenges

Overall, this project opened our eyes to the need for data scientists to use their skills to help create more positive social impact. The hard work involved in developing predictive models does not feel tiring when the outcome has the potential to drive real improvements in the lives of children throughout the developing world.

At the same time, data scientists must keep in mind that for many of these organizations, other ground-level priorities may supersede the need for data-based insights. Global health work is complicated. For many HCOs, data analytics capabilities may not be at the top of their list of priorities.

Nonetheless, light-touch and actionable insights from operational data can provide HCOs with tangible benefits. A data-savvy and digitally enabled workforce can generate sustained improvements in clinical workflows and resource allocation. And through investing in analytics, a few surgical changes to existing procedures can drive real benefits for child health outcomes.

We look forward to working more on BCG Global Health projects, and we welcome a discussion of this analysis on Medium, LinkedIn, or via email.

Acknowledgements and Thanks

This project would not have been possible without the hard work of our team: Jonathan Lim, Alice Chou, and Pranjal Bajaj.

Nor would it be possible without the generous support and engagement by Dimagi and the Bill and Melinda Gates Foundation: WenFeng Gong, Suhail Agha, Mike O’Donnell, and Neal Lesh.

Our work for these healthcare organizations is ongoing, so we respect and preserve their anonymity in this article. But we are committed to donating annually to support these HCOs, and we encourage you to support these community health initiatives as well.

Footnotes:

[1] “WHO chief warns against ‘catastrophic moral failure’ in COVID-19 vaccine access”, UN News, January 18th 2021

[2] Each health network followed the vaccine schedule recommended by its in-country ministry of health, in accordance with the schedules issued by the WHO.

--

--