The Populated Density Metric: Technical Notes

Aaron Le Compte
6 min readJan 4, 2023

--

Technical details used in the development and validation of the Populated Density metric including impact of cell resolution, selection of summary metrics and source datasets

Impact of population dataset resolution

The Kontur population dataset provides population counts at H3 Level 8 resolution. Figure 1 shows the impact of using lower-resolution data such as H3 Levels 4–7. In particular, the ability to identify high-density populated areas is reduced as the larger cell sizes tend to capture both the high-density areas and surrounding low-density areas. Levels 7 and 8 show relative stability, suggesting some degree of convergence:

Fig 1: Change in categorical density distributions for H3 Level 4 resolution to H3 Level 8 resolution

Similarly, the join between H3 cells and the geometry defined by Natural Earth boundaries can be another cause of discrepancy. For example a cell may straddle a country boundary, and the resulting population may be either allocated to the wrong country, or allocated to more than one country. The results in Table 1 show that the discrepancy reduces as H3 resolution increases, and the Level 8 resolution has an overall discrepancy of 0.3% compared to the expected whole-world total of 7.67 billion people:

Table 1: Total cells and population for selected cell resolutions after joining to Natural Earth dataset

Validation of country population totals

The Natural Earth dataset can also be used to validate population on a country basis from aggregating the Kontur population data. Differences in country population totals between the datasets can exist due to issues in addition to analytical processing errors such as population counts at different dates, different methodologies for collecting population data and disputed territorial boundaries.

Table 2 shows that 90% of the population was allocated to countries with less than 5% difference (“prc_total”):

Table 2: Level of agreement of population totals between Natural Earth and transformed density dataset

Table 3 shows that Europe, Asia and Oceania have the highest rates of match between Natural Earth and transformed dataset population totals. North America is next at 87%, with Canada totals at 6.5% difference representing the major proportion of population at higher error bands. Results for Africa are significantly lower than other regions and may reflect reliability of data from several countries in the region.

Table 3: Proportion of population totals with < 5% difference between Natural Earth totals and transformed density dataset by continent

Population density values

The Populated Density metric is a weighted average of population density across all n cells within a country:

Definition of the Populated Density metric

The numerical value of the Populated Density metric is compared to density values from the (total population) / (total land area) method for a number of sources in Table 4 (full results):

Table 4: Comparison of Populated Density metric and average population density for high-density countries

The numerical magnitude of the Populated Density is metric substantially higher than the other density metrics. For example, the Populated Density of Singapore is 26,026 persons/km² compared to 7,617–8,235 persons/km² for overall density sources. This suggests that the population of Singapore is concentrated in approximately ⅓ of the land area of the country. When accounting for public parks, commercial areas, retail, schools and port facilities the proportion of area for residential occupation is a fraction of the total land area.

Density distributions

The high-resolution density dataset allows creating a distribution of population density across a country. It is also often desirable to rank countries in some form to communicate locations that are densely populated versus sparsely populated.

A particular challenge arises in that the shape of the distributions differ substantially between different types of countries. Fig 2 demonstrates the distribution shapes for Singapore and Sudan are markedly different from the USA, which itself is different to Australia, Canada and the UK:

Fig 2: Population density distributions for selected countries

Since the distributions cannot be readily parameterised, potentially percentile-based metrics could help. For example the 90th percentile of population could be used to construct a ranking. This raises challenges such as should the highest 90th percentile be used, the lowest 10th percentile, or locations of 5th to 95th percentiles.

Another possibility is to use the mode of the distributions to summarise density for a country. This would simulate if we were to randomly select a resident of the country, what type of density situation would the resident be living in?

The mode-based metric may work well for countries such as Singapore where a substantial majority of the population live in high-density surroundings. However as shown in Fig 3 a country such as China and Greece has a mixture of densities across their population base. Thus choosing the mode would give a biased impression that a majority live in a specific type of density situation where in reality only 20–30% of the population are in a particular category.

Fig 3: Populated Density of selected countries [Tableau Public]

A limitation of the weighted-average Populated Density metric is that it is simply not possible to summarise such varying distributions of density across countries in a single monotonic rank. Thus comparing the categorical distributions provides a much richer method of analysis than any single ranking system.

Source datasets

The source datasets represent both data used in the computation of the Populated Density metric and population density rankings from Wikipedia, World Bank and CIA Factbook for comparison

Data pre-processing was performed in Python / Pandas, analytical computation in Snowflake using CartoDB Analytics Toolbox Core, and visualisation using GeoPandas

Limitations

The Populated Density metric represents an alternative approach to characterising an element of life in various countries. There are some known limitations to this approach which can impact interpretation of the results and can be improved upon in future iterations of the metric:

  • The source data is not from official government sources. Thus there may be data quality issues with both the population counts and country boundaries used in this analysis.
  • The overall population density metric depends on the resolution of the dataset. The H3 Level 8 resolution from the Kontur Population dataset has been used in this article, with each cell representing approximately 0.737 square kilometres. Results will differ with lower-resolution data, particularly the ability to identify ultra-high population situations.
  • Categories such as “ultra-high density” could represent several different styles of high-density living. For example high-rise towers with substantial space between buildings would show a similar density result to large numbers of medium-height buildings in close proximity with little public areas.
  • Country boundaries can intersect cells, and the specific location of a cell will depend on the join algorithms employed by the database.
  • Categories such as “suburban housing” do not capture elements such as proximity to a major city. Thus the category cannot currently differentiate between the outer suburb of a large city or a small regional town.
  • Cultural factors can influence the use of buildings and thus interpretation of density. For example multi-generational households with shared bedrooms may be common in some countries whereas other cultures would use a similar-sized building for guest rooms and single-occupancy bedrooms.

--

--

Aaron Le Compte
0 Followers

PhD, B.Eng(Hons) | Research, Engineering, Analytics, Data Science