Introducing CEGA’s Geo4Dev Tutorials

The Center for Effective Global Action
CEGA
Published in
6 min readJul 22, 2021

In this post, CEGA Intern Natalie Ayers (MS-CAPP ’22, University of Chicago) and Technology and Data Science for Development Program Manager Sam Fishman highlight a new suite of tutorials that the Geo4Dev Initiative is building to make novel geospatial datasets as well as analytical tools and approaches accessible to a wider community of researchers and partners.

The Lena River Delta in Russia, derived from a high-resolution stereo digital elevation model. Credit: Dan Coe

The physical characteristics of our world — natural and artificial — have shaped societies in ways that have been historically difficult to quantify. However, with increasing quantities of data being collected worldwide by satellites, remote sensors, and geolocation technologies, and a continuous improvement in data processing capabilities, new possibilities are emerging for researchers and policymakers to incorporate geographic data in answering pressing research questions. Fields as diverse as environmental sciences, urban studies, public health, agriculture, and conflict research are beginning to include geographic information into their research to provide additional, previously unavailable insights.

While much of these data are readily available, the perception that they require highly specialized technical capabilities discourages their use. This perception is reinforced by the absence of educational tools produced in tandem with novel data and research — while researchers do often publish code, there’s rarely structured material developed to help beginners utilize new geospatial data and methods.

CEGA’s Geo4Dev Learning Module Project

The Geospatial Analysis for Development (Geo4Dev) initiative, a partnership between CEGA, New Light Technologies (NLT), and the International Initiative for Impact Evaluation (3ie), has launched a new effort to overcome these barriers by providing accessible, research-ready tutorials that walk users through accessing, handling, and analyzing novel or technically challenging geographic data sources. The tutorials draw inspiration from an open source Nighttime Lights learning module developed by the World Bank and NLT, that was first presented at the Geo4Dev Symposium and Workshop in December 2020.

Below, we highlight the first set of Geo4Dev tutorials, alongside the World Bank’s original learning module. Each tutorial is designed to provide users with the tools and code they need to begin leveraging unique and widely applicable geographic data in their own research or analysis. Geographic sources including cloud cover, city lighting, crop land, subway systems, conflict areas, and many others can provide value and produce insights across a wide range of fields. The Geo4Dev learning modules are created to ensure geographic data sources aren’t limited only to those with technical backgrounds, but instead truly fulfill their open-source promise and potential.

World Bank Open Nighttime Lights

Data Source: DMSP-OLS Nightly Imagery (1993–2017); VIIRS DNB Nightly Imagery (2012–2020)

Language: Python

The World Bank’s Open Nighttime Lights tutorial provides a thorough introduction to DMSP-OLS and VIIRS DNB Nighttime Lights data and its usage, including ingestion into Python using Google Earth Engine, data cleaning, basic operations and analysis, visualization, and extraction of summary statistics. This tutorial assumes no prior knowledge of Python or satellite imagery. For more information on the tutorial and the World Bank’s Light Every Night project, see the Light Every Night blog and dataset.

Deforestation and Land Cover

Data source: MODIS Vegetation Continuous Fields (VCF)

Language: R

Information about vegetation coverage of the earth’s surface — particularly changes in coverage — enables the study of questions in fields ranging from the natural to the social sciences. This Geo4Dev tutorial provides an approachable introduction to and demonstration of the use of the MODIS Vegetation Continuous Fields (VCF) product.

VCF is a representation of global surface coverage from 2000 to 2020, comprising data on tree cover, non-vegetative cover, and cloud cover. Given its global span and two-decade time frame, VCF has proven a valuable tool for researchers across disciplines. This tutorial aims to make VCF accessible to a wider audience by providing all the tools required to use the data. Users are first introduced to remote sensing technology and the VCF dataset before being provided with a demonstration of the code required to download the VCF data, select particular geographic areas and types of land cover, visualize the coverage, and produce numerical statistics on coverage levels for use in any additional analyses. Finally, a demonstration is provided that replicates the use of VCF in “The Ecological Impact of Transportation Infrastructure” (2020) by Sam Asher, Teevrat Garg, and Paul Novosad. This paper uses VCF tree cover data to determine whether road and highway construction impacted deforestation in India. The tutorial will provide users in academia, public service, or the private sector with all the tools required to begin leveraging VCF land cover data in their work, enabling a new wave of research incorporating these high-potential data.

Subways

Data Source: Global Subways Data from “Subways and Urban Growth: Evidence from Earth” (2018), Marco Gonzalez-Navarro and Matthew Turner

Language: R

Subways are a popular form of public transportation worldwide, given their speed, capacity, and lack of reliance on road infrastructure and conditions. Proponents argue that the ease of movement that subways provide promotes growth, helps sustain urbanization, and reduces pollution by diverting traffic from other forms of transportation. However, subways are expensive to build and maintain, and some argue they can divert city resources away from forms of transportation and infrastructure that better serve low-income populations. With these conflicting effects and differing city priorities, studies of subway systems and growth in metropolitan areas can provide tremendous insight for policy makers and urban planners trying to best serve their cities.

A new dataset from “Subways and Urban Growth: Evidence from Earth” (2018), by Marco Gonzalez-Navarro and Matthew Turner, makes incorporating subways into analyses possible for researchers worldwide by providing a record of all global subway stations with their geographic coordinates, dates of construction, mapped routes, and ridership over time. Now updated through 2017, these thoroughly compiled records are a previously unavailable trove of information.

The Geo4Dev tutorial makes utilizing this novel data accessible to all audiences regardless of technical ability, demonstrating all code required to import the subway data, select locations or stations of interest, create summary statistics, and build maps and graphical visualizations. It concludes by replicating the first difference regressions used in “Subways and Urban Growth: Evidence from Earth” to identify the effect of subway systems on ridership of both subways and buses. Users of this tutorial will be provided with all the tools required to begin leveraging this resource for work in research across objectives and disciplines.

Only the Beginning

These learning modules are a starting point. We are currently developing three additional modules: an introduction to the use of Radiance Calibrated Nighttime Lights, a machine learning module for crop type mapping based on work by Atlas AI and the World Bank, and a module demonstrating the use of a new machine learning process, MOSAIKS, which provides accessible satellite imagery features for numerous predictive use cases without requiring any interaction with the raw satellite imagery. In the future, CEGA hopes to develop tutorials for a wide range of novel data and methods, including crop-mapping, geocoded conflict data, and machine learning methods for analyzing satellite image data. Students, analysts, and policymakers can use these resources to explore new analytic possibilities and begin integrating novel data and approaches into their research. Researchers pioneering geospatial approaches can amplify engagement with their findings and have greater impacts on their research ecosystems. It’s great if people find a paper with novel geospatial methods, but it’s even better if those readers can quickly access and use the same data and leverage research insights in new contexts. We encourage any users with ideas for new modules or feedback on existing modules to contact Geo4Dev at info@geo4.dev.

About Geo4Dev

The Geospatial Analysis for Development (Geo4Dev) Initiative, hosted by CEGA in partnership with New Light Technologies (NLT), the International Initiative for Impact Evaluation (3ie), promotes the use of geospatial data for the targeting, design, and evaluation of social and economic development programs, especially in low- and middle-income countries. The initiative drives the development and adoption of new data, tools and methods for conducting geospatial analysis, while supporting their application across diverse sectors including agriculture and food security, urbanization, climate change, impact evaluation, humanitarian crisis, and disaster response.

--

--

The Center for Effective Global Action
CEGA
Editor for

CEGA is a hub for research on global development, innovating for positive social change.