Global Research Insights Using Non-Traditional Data

Everywhere we look, markets are being transformed by new remote sensing technologies, digital data streams, and analytical approaches. CEGA’s recent Measuring Development: Data Integration and Data Fusion conference showcased how researchers are using these advances to address problems of poverty and inequality.

The Center for Effective Global Action
CEGA
6 min readMay 12, 2020

--

This post is a roundup of talks delivered at CEGA’s March 31st conference, Measuring Development: Data Integration and Data Fusion. Summaries were written by CEGA Program Manager Samuel Fishman.

New sources of non-traditional data — including cell phones, satellites, and social media — are revolutionizing research in low- and middle-income countries by enabling detailed measurements of social and economic outcomes more cheaply and frequently than with traditional data collection (e.g. field surveys). As COVID-19 lockdowns hamper person-to-person surveys, non-traditional data are especially valuable as they enable researchers to continue their important work without relying on household visits, while helping governments and NGOs respond to the crisis in real-time.

On March 31st, CEGA and the World Bank hosted Measuring Development: Data Integration and Data Fusion virtually over Zoom. The conference highlighted numerous ways in which mobile and remote sensing data can been leveraged to measure socioeconomic welfare, infrastructure, disease spread, firm behavior, judicial outcomes, and other outcomes.

Below is a roundup of some of this groundbreaking research and its potential to inform policy globally.

MeasureDev Research Roundup

Responding to COVID-19

Non-traditional data methods don’t just offer opportunities to keep major research projects on the table during the COVID crisis; they’re also opening up opportunities to predict and combat the spread of the virus itself.

  • Sveta Milusheva (World Bank) showed how combining call detail records (CDR) with health surveillance data can track and predict the proliferation of disease on a national scale, using a case study in Senegal. She is using her framework and findings to help the Bank predict the course of COVID-19 in other countries.
  • Marelize Gorgens (World Bank) summarized how the World Bank is using data to: 1) predict the spread of COVID-19; 2) analyze ‘hotspot’ locations and populations; 3) monitor, improve, and evaluate policy responses, and 4) understand the wider economic impacts. The Bank is able to do this by tapping into vast digitized datasets and using cutting-edge artificial intelligence.
  • Emmanuel Letouzé (Datapop Alliance) also presented on how big data is being harnessed to address COVID-19, and discussed particular vulnerabilities faced by the Global South. Letouzé noted that we lack systematic approaches for using data to respond to crisis in the most vulnerable countries, and outlined a “comprehensive and contextualized” approach for using abundant real-time data, such as mobile phone data, to respond to the crisis in low-income economies.
  • Walter Kerr (Zenysis Technologies) discussed how data integration software developed by Zenysis can rapidly transform governments’ information management systems to manage multi-layered data, allowing for the real time analytics required to respond to COVID-19.
Credit: Silvision/Flickr

Mobile Phone Data

  • Emily Aiken (UC Berkeley) showed how a machine learning algorithm using CDR data could identify ultra-poor households in Afghanistan about as accurately as survey-based measures. These granular socio-economic predictions have the potential to inform government programs targeting at scale with an efficiency impossible to obtain using survey data.
  • Sylvan Herskowitz (IFPRI) discussed research using cellphone records to show how firms in Afghanistan respond to insecurity. A deadly terrorist attack can lead to a 4–6% decrease in firms in a district, and that employees move in large numbers between provincial capitals and Kabul to escape violence. Since CDR data logs time, call location, and location of the cell phone tower, the researchers were able to use corporate phone records to pinpoint movements of firms and employees in the wake of violence.
  • Xavier Espinet (World Bank) presented on network science approaches to road systems, vulnerability modeling, and CDR data to calculate demand for public transport. The resulting model takes into account seasonal disruptions, climate change, and other hazards, and can help produce recommendations for infrastructure investments in areas of high vulnerability and high demand.

Text and Administrative Data

Credit: Sam Fishman
  • Data collaboration and sharing platforms are increasing the value of public administrative data. Sam Asher (Johns Hopkins School of Advanced International Studies) presented on a geographic platform called SHRUG, which is building a sharable database for Indian public administrative data. This large-scale rationalization of public sector data (emerging from social programs, census data, etc.) can help governments target services and guide private sector investment.

Satellite Data

  • David Newhouse (World Bank) presented on a method for measuring household characteristics at a granular level using a combination of survey and satellite data. In Tanzania and Sri Lanka, they were able to measure these household characteristics similar or better accuracy than survey estimates for the regions, and saw measurement efficiency gains equivalent to quadrupling the sample size of survey based research.
  • Mattia Marconcini (German Aerospace Center (DLR)) discussed their work with MindEarth on a novel and widely applicable approach to high-resolution mapping of wealth inequality using freely available earth observation data and in-situ survey data from wealth surveys (DHS/LSMS). Given the wide availability of the data sources employed, the researchers estimate wide generalizability of the approach across national contexts. In addition to Nigeria where they already piloted their methods, they list 20 other countries with similar open available data, including India; Pakistan; Philippines; and Ethiopia.
  • Hannnah Kerner (University of Maryland) discussed her work with NASA Harvest, which is revolutionizing analysis for satellite data on agriculture. Using massive quantities of multispectral and hyperspectral data collected at high frequencies (11 Terabites a day), NASA Harvest is using machine learning techniques to map crop types and planting timelines. This data can in turn inform implementation and design of policies like farm insurance.
Credit: NASA, Earth Observatory
  • Tara O’shea (Planet) presented on Planet’s efforts to use satellite data to monitor forest carbon stocks and emissions. By combining Airborne Light Detection And Ranging (or LiDAR) data with Planet’s own temporal resolution satellite imagery, researchers are able to use machine learning models to measure above ground carbon on a massive scale. They’ve already mapped carbon stocks for the entirety of Peru using these data fusion and machine learning approaches, and revealed that present estimates are vastly underestimating carbon released from deforestation.

MeasureDev was an equally great opportunity to learn about the pitfalls of non-traditional data approaches. Keynote speaker Josh Blumenstock’s (co-faculty director at CEGA) presentation on manipulation proof machine learning discussed methods developed to deal with one of the growing threats to non-traditional data research: increasingly sophisticated efforts to “game” algorithms. Additional challenges, including under-representation of marginalized populations in many datasets and other failures of data to reflect realities on the ground, make it critically important to apply mixed data approaches that combine traditional and non-traditional data. However, with the robust set of contextually grounded methods research on display at MeasureDev, we’re optimistic that big data can be harnessed to respond to poverty and crisis around the globe.

If you would like to suggest a correction to any of the above summaries, please email CEGA Communications Associate, Dustin Marshall, at dmarsh1231@berkeley.edu.

--

--

The Center for Effective Global Action
CEGA

CEGA is a hub for research on global development, innovating for positive social change.