Forecasting Food Insecurity Levels in Near Real-Time Using a Machine Learning Framework

By Shahrzad Gholami, Erwin Knippenberg, James Campbell and Juan Lavista Ferres

Data & Policy Blog
Data & Policy Blog
7 min readOct 28, 2022

--

In this blog, Shahrzad Gholami and colleagues introduce their research recently published in Data & Policy: which demonstrated, in a case study in Malawi, the feasibility and accuracy of an early warning system for food insecurity

Millions today face hunger. Nearly 690 million people were hungry in 2020, or 8.9% of the world population, according to the Food and Agriculture Organization of the United Nations. This food insecurity is exacerbated by increasingly frequent and severe droughts, floods, and other shocks driven by climate change.

In order for humanitarian actors to prepare for, manage and deliver assistance in a timely way, they need early warning systems that allow aid workers and organizations to quickly forecast the incidence of hunger as conditions change. Early detailed predictions of which households are most likely to suffer from food insecurity would help to mitigate adverse effects, such as nutritional deprivation experienced by children early in life, which can have lifelong consequences.

A joint study was conducted by Microsoft and the Catholic Relief Services (CRS) that applied machine learning techniques to this very problem. The data used in this study came from household surveys administered on the ground in southern Malawi by community-based CRS teams. It was combined with sophisticated machine learning algorithms to create a model that could predict future food insecurity at the household level.

The results are promising. The model had an 83% accuracy rate in predicting food security outcomes and could generate accurate forecasts up to 4 months in the future. These results act as a proof-of-concept that showcases how, by combining recurrent survey data collected by embedded enumerators with machine learning algorithms, one can gain the ability to forecast household-level food security outcomes in near real-time, offering predictive insights in the context of ongoing programmatic and policy decisions.

Context: food insecurity and flooding in Malawi, CRS and the UBALE program

Malawi is a land-locked country in southeastern Africa. In the first few months of 2015, flooding in Malawi displaced an estimated 230,000 people, damaged about 64,000 hectares of land, and destroyed the asset wealth of many. By August 2015, the floodwaters had receded, and most of those displaced returned to rebuild their lives.

In 2016, a consortium led by Catholic Relief Services (CRS) implemented the United in Building and Advancing Life Expectations (UBALE) program, a USAID funded project that served disaster-prone districts in Malawi with interventions including agriculture, livelihood, nutrition, and community disaster risk reduction activities. UBALE aimed to sustainably reduce food insecurity and to build resilience by reaching 250,000 vulnerable households in 284 communities.

The MIRA data collection protocol

To capture household data for resilience analysis, CRS devised an approach, which consisted of a set of surveys, called the Measurement Indicators for Resilience Analysis (MIRA) protocol, conducted at community-based sites. CRS first administered a baseline survey with a more comprehensive survey followed up on an annual basis, collecting demographic, livelihood, economic and shock data. Additional follow up surveys, taking less than 10 minutes each, were administered to the same households on a monthly basis, collecting data about food security and hunger, exogenous shocks, , and status self-evaluation indicators.

Overall, the surveys collected data on 4 key indicators, the Household Dietary Diversity Score (HDDS). The Household Hunger Score (HHS), the Food Consumption Score (FCS), and the Reduced Coping Strategies Index (rCSI), and a number of determinants associated with food security and household vulnerability. This provided essential information that tracked household well-being trajectories over time in a shock-prone environment. The collected data was uploaded on a monthly basis to a cloud-based database and immediately available for analysis.

The Coping Strategy Index is one of the modules that is collected on a monthly basis and asks questions that measure a household’s experience with food insecurity and coping strategies employed in response. Weights (w) were assigned to each strategy and used to compute an overall household reduced coping strategy score (rCSI), which was used as the food insecurity metric.

Coping strategy index module:

In the past 7 days, if there have been times when you did not have enough food or money to buy food, how many days has your household had to:

  • Rely on less preferred and less expensive foods? (w=1)
  • Borrow food, or rely on help from a friend or relative? (w=2)
  • Engage in piece work or other menial labor? (w=1)
  • Send Children out to beg? (w=4)
  • Reduce number of meals eaten in a day? (w=1)
  • Reduce size of meals eaten in a day? (w=1)

Applying Machine Learning methods to the MIRA raw dataset

By surveying households consistently and at high frequency, CRS was able to construct a panel dataset with repeated observations for interviewed households. Microsoft then analyzed this data using supervised machine learning methods in order to predict which surveyed households would be at greatest risk for future food insecurity. Supervised machine learning is a machine learning technique that uses labeled data to classify or predict outcomes.

For this study, we selected households from the MIRA dataset that regularly participated in the MIRA program for at least 20 consecutive months, from October 2017 to November 2019, which would be long enough to capture seasonal fluctuations. This resulted in a subset of 1,886 households with 37,720 observation records available for our study.

Figure 1. Machine learning workflow developed to study food security status of households based on the Measurement Indicators for Resilience Analysis protocol.

We focused on two goals with policy relevance: First, conducting a community-level analysis to determine which features are most important for predicting food insecurity, and second conducting forecasting at the household level, to predict vulnerability to future food insecurity.

Step I: Identifying key predictors of future food insecurity at the community level

The first step was to calculate the top predictors of future food insecurity at the community level. We used the entire MIRA dataset to train, test and compare several classical machine learning models, and determined the top-performing and most robust model.

Then, using that model, along with the dataset and rCSI scores, we applied a black-box interpretability technique known as Shapley additive explanations (SHAP) to analyze the data. The SHAP framework uses game-theoretical notions to determine the contributions of predictor features to final predictions. This approach assumes that predictor features are similar to players in a coalitional game where the game payoff, which is the predicted probability in our problem, is distributed among the features based on Shapley concepts in game theory.

This analysis allowed us to tease variables out from the data and shortlist which features have the greatest predictive power for future food insecurity and should be monitored accordingly.

While these results should not be interpreted causally, using the SHAP framework we found that for the regions studied here, a small subset of 20 indicators out of the 126 accounts for most of the variations in food insecurity.

The locations of the households were found to be among the top predictors, reflecting how food insecurity is concentrated in specific areas where households are disproportionately poor and lack access to safety nets. Another highly predictive feature included self-reported subjective notions of one’s current and future welfare.

Other predictors were previous food insecurity, indicating its persistence within households, and experience with past shocks. This provides important insights in designing future ‘rapid-response’ surveys that could be administered over the phone or by SMS to target a larger population and nowcast their food security status,

Step II: Predicting vulnerability to future food insecurity at the household level

Having identified 20 key predictors for food insecurity, the next goal was to forecast the vulnerability of each household to food insecurity based on their historical records of these key predictors. To do this we leveraged the MIRA dataset’s time-series characteristics and compared specific neural network architectures as well as random forest performances.

The model parameters could be updated in real time based on the most up-to-date data collected from the field. As these indicators come in, the algorithm can update its projections several months into the future to inform the planned delivery of assistance. To that end, we demonstrated the performance of our proposed approach up to 4 months in the future, to allow for the typical operational response cycle. The model had an 83% accuracy rate in predicting food security outcomes up to 4 months in the future.

Policy Relevance

These results provide a use case in how to leverage machine learning models trained on data collected from embedded enumerators at the household level to predict future levels of food insecurity. It showed that awareness of previous levels of food insecurity, combined with 20 other key indicators, is sufficient to make predictions of community and household food insecurity up to 4 months into the future. This would provide sufficient time for government agencies and humanitarians to mobilize and distribute assistance.

Monitoring of these features could be scaled rapidly and at a low cost as part of an early warning system. While our empirical analysis is based on the MIRA survey data collected in southern Malawi, our proposed methodology could be applied to datasets in other communities facing recurrent food insecurity.

Read the full research article (open access) here.

About the authors

Shahrzad Gholami and Juan Lavista Ferres are affiliated with the Microsoft AI for for Good Research Lab, in Redmond, Washington, USA.

Erwin Knippenberg works in Poverty and Equity Global Practice at The World Bank, Washington DC, USA.

James Campbell is based at the Food Security Monitoring and Evaluation Programs, at the Catholic Relief Services, Baltimore, Maryland, USA.

***

This is the blog for Data & Policy (cambridge.org/dap), a peer-reviewed open access journal exploring the interface of data science and governance. Read on for five ways to contribute to Data & Policy.

--

--

Data & Policy Blog
Data & Policy Blog

Blog for Data & Policy, an open access journal at CUP (cambridge.org/dap). Eds: Zeynep Engin (Turing), Jon Crowcroft (Cambridge) and Stefaan Verhulst (GovLab)