Urban Mapping: 54+M buildings in Russia

Geoalert platform
Published in
4 min readDec 10, 2019

We’ve completed the 0.0.1 version of our most ambitious big map data project— Urban Mapping, 54 364 789 buildings all over Russia that are available through our platform’s API. In the demo below we transformed polygons into points (centroids) and compared them with Openstreetmap “state of the map” data using vector tiles to visulalize both layers.

https://geoalert.github.io/urban-mapping/ Find project description on our Github.

Below is a story in and around the project…

How to calculate people from Space?

I’ve long wondered how scientists estimate the world population. If some report tells you there are 7+bln people and this number is projected to grow to some more bln in the next years — where does the raw data come from? Is it only the census data supplied by the UN country members and how is it verified and updated for each country?

Seems it’s a big deal to calculate people worldwide since the difference in accuracy at the country/region level introduces significant errors into the regression and weighted models used to calculate the population for a specific area. What impresses us the most is how remote sensing imagery becomes the key part in the global projections like this. Satellite imagery allows us to define urban areas at different levels of resolution to generate high-resolution population maps based on objective and continuously updated data.

However, the satellite data used in population modeling has its own limitations regarding the sensor capabilities that can also lead to inaccuracy. Many researchers point out the limitations of Landsat 30m imagery in rural areas where this resolution is insufficient to detect the very sparse and scattered man-made objects to define populated areas. A partial solution comes in a form of nightlight imagery. For example, the Global Rural Urban Mapping Project uses it for adding urban and rural boundaries. But we can face another challenge like in the pictures below: why is there so little light (and population?) in the northern part of Korean peninsula but a lot at the seaside? or why the are some very bright spots in the size of large cities in the northern Siberia?
You’re more than welcome to try your luck.

The nightlights: left — Korean Peninsula; right — Russian Siberia, Tyumen region

Recent global projects

Last year Facebook AI team presented their method for generating high-resolution population density maps at a global scale. They used a tailored CNN model to detect man-made structures from satellite imagery of 0.5m resolution. At the moment, they released population density maps, allocated to 1 arcsecond cell, of ~30 countries, mostly in Africa. (https://ai.facebook.com/blog/mapping-the-world-to-help-aid-workers-with-weakly-semi-supervised-learning)

You may have heard about MS buildings project (https://github.com/microsoft/USBuildingFootprints)?
The most recent update of Facebook’s RapID (https://github.com/facebookincubator/RapiD), a fork of ID editor, allows to display these buildings in addition to what is not presented in Openstreetmap. The selected feature can be added and edited by user.

Auto mapping at the building scale

In the meantime, a small team in Moscow decided “we are in the game” and started auto-mapping of the buildings all over Russia.

The entire Russian territory became our primary focus due to the following reasons:

  • The size of the territory is challenging — we have to figure out how to reduce our time and processing costs
  • We hope to impact the Russian spatial data services like surveying and insurance that are still far from completeness and ubiquity.
    “…Estimates of the housing stock also vary significantly. According to the Federal Statistics Service there are 2,125,211 multi-apartment buildings in Russia, but 1,009,696 houses on the “Reform GKH” portal, and 1,586,048 houses in “GIS GKH...”*
  • Most of our training and validation data covers Russian cities. It’s better to start from the place you know better

We applied segmentation CNN to satellite imagery at the low-resolution level to get the primary binary classification — if the area is probably populated.
This way we’ve got a ~30x reduction in the total number of tiles to be processed at the HDM zoom. It makes about 130 mln tiles 256x256. The scaling of the Geoalert data processing workflow was an important step in achieving an acceptable overall performance. So the HDM model runs on GPU cluster that is part of Skoltech’s High-performance calculation infrastructure (https://www.skoltech.ru/en/2019/01/zhores-supercomputer-presented-at-skoltech/).

Geoalert’s buildings on the top of nightlight Black Marble basemap

We keep working in the directions:

  • Improving CNN output using validation data. Our main approach is merging auto-mapping results with external data with spatial reference like Openstreetmap and Reforma GKH. About 2 mln buildings have been auto-validated and we look forward to getting more. Validation, buildings “heighting” (https://medium.com/geoalert-platform-urban-monitoring/buildings-height-estimation-7babe6420893) and semantic enrichment of the data is a huge topic that is worth a separate story.
  • Post-processing issues such as polygonization that can be done much better and with fewer unexpected headaches.

All this stuff is implemented into Geoalert platform that allows setting up multiple workers to organize them into workflows that connect data sources and scale up a processing.

We appreciate your comments and proposals. Stay tuned about the Urban Mapping project — subscribe to our GitHub page (https://github.com/Geoalert/urban-mapping) or sign up for our newsletter:



Geoalert platform

We apply Machine learning to automated analysis over Earth observation data