Where is everybody? This simple question is critical to policy questions in a wide range of fields, including public health, food security, and disaster risk reduction. The census is the most basic tool for answering this question, dating back all the way to ancient Egypt. Today, countries around the world conduct censuses and release the results in different languages and formats. How can we hope to assemble all of them into a single global picture of the world’s population?
Fortunately, the Center for International Earth Science Information Network (CIESIN) at Columbia University has already done the hard work. They gathered the population data from all the world’s censuses in order to create a single dataset, the Gridded Population of the World. The fourth version of this dataset (known as GPWv4) was recently released under a Creative Commons Attribution license and is now available in Google Earth Engine. Let’s take a look!
To start with a simple example, how many people live within 100 kilometers of Google’s headquarters in Mountain View?
You can answer this question yourself using the Earth Engine Code Editor. Here’s a simple script that loads the 2015 population count data, computes a 100km buffer around the coordinates of Google’s headquarters, and finally adds up all the population count pixels within that region:
var count2015 = ee.Image('CIESIN/GPWv4/population-count/2015');
var region = ee.Geometry.Point(-122.085, 37.422).buffer(100000);
// Result: 8090751.943975031
The answer: just over eight million people live within 100km of Google HQ.
GPWv4 includes population estimates for the years 2000, 2005, 2010, 2015, and projections for 2020. Let’s take a look at how the population around San Francisco has been changing over time:
var count = ee.ImageCollection('CIESIN/GPWv4/population-count');
print(ui.Chart.image.series(count, region, ee.Reducer.sum()));
That script loads all five years of population data as an image collection and then makes a chart by summing all the pixels in our region for each year. The result looks a little like this:
(I made that chart using a slightly longer script that also sets the chart’s title, axis labels, and so forth. If you like, you can check out the complete script that I used to make all the figures in this article on GitHub.)
The GPWv4 dataset comes in four flavors. So far we’ve been working with the
population-count dataset, whose values represent the estimated number of people contained within each pixel. This version is useful when you want to add up the values of all the full-resolution pixels in some region, like we’ve been doing. (All versions use a 30 arc-second pixel grid, or approximately 1km per pixel.)
This version of the dataset has several downsides, though. First, because the exact area of the pixels varies over the surface of the Earth, you cannot directly compare population count values from different locations. Second, because the meaning of the values is tied to a particular pixel grid, you cannot use the values directly in any other grid. As a result, for many purposes it is better to use the
population-density dataset instead, whose values represent the number of people per square kilometer.
Let’s take a look at population density at a few locations. In my script I hard-coded six locations in a table named
cities, using a 10km buffer around the approximate center of each city, but you could load your own regions of interest. Earth Engine makes it easy to print a chart of the mean population density at each location over time:
var density = ee.ImageCollection('CIESIN/GPWv4/population-density');
density, cities, ee.Reducer.mean()).setChartType('ColumnChart'));
Notice how the population density of Lagos, Nigeria, is skyrocketing!
Both the datasets that we’ve looked at so far (population counts and densities) are based directly on national censuses and population registers. GPWv4 also includes two other variations that have been adjusted so that they match the 2015 Revision of UN World Population Prospects country totals. UN estimates are in disagreement with national population estimates in many countries. Using the UN-adjusted data is probably a good idea for many global and regional analyses, whereas if you’re working at the national and sub-national data then you may prefer to use the original census data without adjustment.
Let’s load both versions to look at the difference in 2015:
// Load both the raw and UN-adjusted 2015 population density data.
var density = ee.Image('CIESIN/GPWv4/population-density/2015');
var adjustedDensity =
ee.Image('CIESIN/GPWv4/unwpp-adjusted-population-density/2015');// Compute the log ratio between the two.
var ratio = adjustedDensity.divide(density);
var logRatio = ratio.where(ratio.neq(0), ratio.log());// Make a pretty visualization.
palette: ['Crimson', 'Silver', 'MediumBlue'],
Most countries are gray, which means that the UN estimates agree closely with the national estimates. Several countries are bright blue, indicating that the national estimates are much higher than the UN estimates: by 60% in Turkmenistan, by almost 2X in Qatar, and by over 3X in Equatorial Guinea! On the other hand, the UN estimates are higher in other countries, shown in red, e.g. by 42% in Oman. The GPWv4 documentation includes more information about the differences between the raw and UN-adjusted population data.
The GPWv4 also includes some additional data grids that you can use to assess the population data in different locations. One particularly important layer is the “mean administrative unit area”. The population grids were made by taking the population numbers from censuses and spreading them uniformly over the land area within each census unit. If the census unit is large, this can mean that population estimates are less accurate at high resolution. The mean administrative unit area is one measure of the spatial uncertainty of the population data. Let’s take a look:
var grids = ee.Image('CIESIN/GPWv4/ancillary-data-grids');
var logArea = grids.select('mean-administrative-unit-area').log10();
palette: ['Crimson', 'Silver', 'MediumBlue'],
We can infer from this map that the accuracy of the population distribution at a high resolution is likely quite high in the US, much of Europe, and a few other places like New Zealand and eastern Brazil. On the other hand, it may be relatively low in much of Africa, northern Canada, Russia, and central Australia, where fine-grained census data is not available. Fortunately, many of those areas have low population density in the first place.
To improve the spatial distribution of population data in places like Africa, several research teams have begun developing techniques to automatically identify buildings and settlements in high-resolution satellite imagery. I’ll take a look at their progress in a future article.