Joshua Blumenstock: The Knowns and Unknowns of Big Data and Poverty Alleviation

This post, written by Lisa Bauer, Program Manager at The Blum Center for Developing Economies, was originally published on the Blum Center’s website.

In international development circles, the application of machine learning to monitor and alleviate poverty interventions has become a much discussed aspiration. However, Joshua Blumenstock, assistant professor at the UC Berkeley School of Information and director of the Data-Intensive Development Lab, cautioned at a recent Blum Center Faculty Salon that unknowns abound and new digital methods may serve more as a complement than a replacement to traditional approaches, especially in the area of economic assessment.

At the salon, Blumenstock highlighted two ways big data is altering the field of international development: first, in measuring quality of life and welfare in low-income countries; and second, in offering financial inclusion applications for poor populations. His colleague Moritz Hardt, assistant professor of electrical engineering and computer science, provided a lead response, drawing from his decade of research on fairness and machine learning. Together, they highlighted that over the past five years big data sets — from mobile phone companies, satellite imagery, social media platforms, and international development organizations — paired with advances in machine learning technology, have generated fascinating and controversial work.

“Over less than a decade we have experienced a global explosion of data, bringing us to this fairly nascent intersection of big data and poverty alleviation efforts,” said Blumenstock. “With the mass availability of large-scale data sets, we now have access to new sources of data on previously remote, low-resource settings.”

A key contributor to these new data sets is the stunning rise in cell phone adoption. According to the World Bank, more households in developing countries own a mobile phone than have access to electricity or clean water, and nearly 70 percent of the bottom fifth of the population in developing countries owns a mobile phone (note: not a smartphone). An increase in satellite and remote sensing data has also contributed to the data explosion. The combination of these data sources, with machine learning, means that data can be synthesized and applied in new ways.

Blumenstock said that satellite imagery in particular is becoming a key source for development research because it reveals basic physical infrastructure and quality of life trends, such as roof material, road quality, and land plot size. This information can help researchers estimate the basic traits of a town, including average household wealth and population density. Blumenstock is currently conducting research with Facebook to provide a publicly available map of micro-regional estimates of wealth and poverty.

“Leveraging machine learning to analyze these forms of data, we can draw conclusions about certain aspects of quality of life with nearly the same accuracy as traditional, multi-million dollar field surveys,” Blumenstock explained.

Given the time and cost savings, international multilateral organizations like the World Bank and United Nations are eager to start applying these big data applications. Likewise, many governments in developing countries are eager to bypass traditional data collection methods in favor of machine learning-assisted data analysis because of the large time and monetary costs of national census surveys.

Blumenstock is hopeful that by supplementing traditional poverty indices with high-frequency estimates based on satellite and digital data, development practitioners can have low-cost options for impact evaluations and project monitoring. He said this data-plus-machine-learning approach could help open up major innovations in three areas: 1) targeting specific populations for program implementation; 2) monitoring and mitigating the effects of natural disasters, health epidemics, and migration patterns by allowing, for example, aid workers to deliver needed resources to hard-hit areas; and 3) enabling different approaches to impact evaluation, specifically randomized control trials, which can costs millions of dollars.

Financial inclusion was the other area Blumenstock highlighted as potentially benefiting from algorithm-based decision making. He pointed out that globally 1.7 billion people lack a bank account, half of whom are women in poor, remote regions — yet about two-thirds of this population have access to a mobile phone. Companies like M-Pesa, launched in 2007 in Kenya, are engaged in wide-scale mobile phone-based money transfering, financing, and micro-financing services. As a result, there has been a surge in “digital credit” banking led by the private sector in low-income countries, which is increasing financial inclusion for populations without formal credit.

Using data to analyze phone use patterns, some banks and intermediary financial technology (fintech) companies are testing ways to develop alternative digital credit scores to provide uncollateralized loans to the unbanked. By aggregating digital trace data that includes Internet searches, emails composition, even browser and smartphone choices, and then using machine learning to assess the data, banks can formulate digital credit scores that predict who is most likely to default on a loan. One of the largest entitles to use this approach is a Kenyan digital savings and loan product called M-Shwari, which is built on M-PESA and run by the Commercial Bank of Africa and the mobile network operator Safaricom. Using M-Shwari, customers who lack a bank and credit history can take out loans. Beyond increasing accessibility to loans, digital credit also has the potential to dramatically reduce transaction costs and provide immediate disbursement.

Providing loans to previously unbanked populations can stimulate critical economic growth. Yet Blumenstock was quick to point out that the concept of digital credit scoring and it’s rapid growth across developing economies raises several concerns. First, most of these loans are short-term with very high interest rates, which can indebt customers. Second, leaning too heavily on algorithms to churn out credit scores can create a variety of biases.

Blumenstock recently visited Kenya to gain greater insight into the mobile banking process, where digital loans have quickly risen in popularity. According to a 2018 study led by FSD-Kenya, more than one in four Kenyans have taken out a digital loan over the past five years, comprising an estimated 6 million Kenyan borrowers. At the time of the study, more than half of these digital borrowers had at least one outstanding loan, and 14 percent had digital loans from multiple banks. Among the long-term implications to digital credit-based loans are credit bubbles, over-indebtedness, and the overall impact on social welfare.

“There’s a lot of allure to using AI to leapfrog traditional methods, from digital currency to data collection,” said Blumenstock. “But it creates a silver bullet fallacy problem. We’re still grossly unaware of its impacts and what exacerbating issues it could lead to.”

Lead discussant Moritz Hardt spoke on the limitations of machine learning, particularly in relation to gender and race biases, and their corresponding consequences to everything from credit scores to healthcare predictions to providing child services to decisions in the criminal justice system.

“It’s not easy to define discrimination in algorithmic decision-making processes,” said Hardt. “We are at a sobering stage right now; people are becoming aware of the limitations and questioning possible structural issues.”

Hardt provided an example of how risk assessment algorithms are used as a predictive tool to determine which individuals are at high risk for missing their court date following an arrest. If deemed as pretrial “high risk” by the algorithm, an arrested individual is held in jail until their court date, with often dire consequences for their income and family circumstances. Such predictive algorithms are similarly used to inform criminal justice officials decisions on how high to set the bail, sentencing, and who gets early release.

“What is often neglected in designing algorithms are the structural and complex socio-cultural challenges unique to each person,” Hardt said.

Blumenstock responded that “we need to endogenize social sciences into machine learning,” warning that taking off-the-shelf algorithms for ad targeting and plopping them into poverty targeting would have obvious negative results.

“Off-the-shelf tools typically assume that the social processes being modeled are static,” he said. “But these processes are inherently dynamic, changing over time and over subpopulations. The appropriate use of machine learning in such contexts requires a more nuanced understanding of the people who are being targeted, and what assumptions might be reasonable or, more often, totally implausible.”