Using Ensemble Deep Learning to Deliver Aid Better in Post-Flood Pakistan

The Center for Effective Global Action
CEGA
Published in
4 min readMay 15, 2023

Felix Agyemang (Lecturer in Data Science, University of Manchester), Sean Fox (Associate Professor in Global Development, University of Bristol), Rashid Memon (Assistant Professor, Qatar University), and Levi Wolf (Senior Lecturer in Quantitative Geography, University of Bristol) developed a new machine learning algorithm using satellite imagery to identify poor households to receive aid from the Sindh Social Protection Strategy Unit. This study was supported by CEGA’s Targeting Aid Better initiative.

A woman cooks roti outside her tent | Abdul Majeed, EU Civil Protection and Humanitarian Aid

In the summer of 2020, as the COVID pandemic was crushing economies around the world, devastating floods in Pakistan led to hundreds of deaths, left tens of thousands displaced, and damaged hundreds of thousands of homes. In the wake of these disasters, the recently formed Sindh Social Protection Strategy Unit (SPSU) was looking for ways to improve targeting of social protection resources to reach its most vulnerable citizens, particularly in rural areas.

Even though half of Sindh’s population lives in rural areas, and 37 percent of these live below the national poverty line, a key obstacle was a lack of information about the specific location of the poorest households. This is not uncommon in low- and middle-income countries: half of all countries do not have sufficient data to produce accurate poverty maps. The challenge was especially urgent in Sindh, where the poverty rate in some of the most flood-prone areas is as high as 53 percent.

To support the SPSU’s response, our team was tasked with developing an approach to rural poverty mapping that could be done quickly and cheaply — without the need for time-consuming and expensive household surveys. Building on recent research, we applied deep learning techniques to satellite imagery and other geospatial data to predict where the poorest households were likely to be found.

This approach requires high-quality training data and, fortunately, our partnership with the SPSU provided access to geo-referenced data from a poverty survey covering nearly two million anonymized households across 14 of Sindh’s 24 districts. The data were then mapped to grid cells of 1km2.

Next, we trained three convolutional neural networks to predict whether the median household in every inhabited grid cell was likely to be chronically poor. The training used the survey data, a locally defined poverty threshold, and three inputs containing important information on local economic geography: daytime satellite imagery from the Copernicus Open Access Hub, nighttime satellite imagery from NASA, and a global map of accessibility. We combined predictions from the three individual neural networks to make a final, ensemble consensus prediction of poverty.

To determine the accuracy of our predictions, we used a three-stage validation framework. The first two stages employed “hold-out” validation approaches that work by randomly dropping some of the training data (i.e. the poverty scores of cells from the initial survey) and using the remaining data as validation data. These results were promising.

In the third stage, we did something less conventional: We made predictions for cells in a district for which no survey data had been collected, using only satellite imagery and the accessibility map. Working with Gallup, we then surveyed 7,000 new households in that district. This approach to ground-truthing a model is the most robust, and we were relieved (and frankly a bit surprised) to find that the results corroborated our hold-out validation results.

Figure 1: Cross-validation performance of Ensemble CNN model in selected districts. Notes: Hits = observed and predicted poor; Misses = observed poor and predicted not poor.

Overall, our ensemble deep learning approach was significantly better at identifying cells likely to contain chronically poor households than a simple flip of the coin. Combining this approach with gridded population estimates could provide a cost-effective means of improving geographical targeting at high resolution — and enhance social protection in Sindh.

Recently, the SPSU launched an initiative with the World Bank to strengthen social protection that relies on a multidimensional poverty index instead of poverty prevalence estimates. While this approach represents a clear improvement on past targeting processes, it is likely that governments and local NGOs will soon be able to conduct their own rapid, cost-effective poverty assessments at high spatial resolution without the need for large and costly surveys. As machine learning and artificial intelligence evolves rapidly, the rising accuracy of remote prediction of poverty will increasingly offer a viable and robust alternative that expands the suite of tools for fighting poverty worldwide.

--

--

The Center for Effective Global Action
CEGA
Editor for

CEGA is a hub for research on global development, innovating for positive social change.