Open Urban Mapping — Russia

GeoAlert
GeoAlert
Oct 22 · 7 min read

Hooray! We have recently completed automatic building detection for the entire territory of Russia using Mapbox Satellite imagery.

What? Haven’t you done that already?

Nope. Last year we started a project called Urban Mapping, which was powered by our Geoalert platform, launched a demo project and published the “zero” version of the “building footprints” database for the Russian Federation. Since then, we’ve been working on creating a commercial product, validating and enriching data with semantics (building heights, addresses, etc.), and our artificial intelligence for automatic mapping is already stretching its tentacles to other countries. So we’ll be glad if you like a sneak peek into the capabilities of our platform.

But this post is about other big ideas. We’ve long been looking for contribution to Open Data, and, when it comes to Spatial, Openstreetmap (OSM) is the place to go.

Open Data for Russia — input imagery

The variety of satellite imagery mosaics that can be used for digitizing in OSM became simply gorgeous since big guys such as Maxar or Microsoft started to contribute to OSM. For the territory of Russia, OSM has more than 20 million building features and almost 10 million km of roads (as of September 2020).

But only Mapbox explicitly extended its license with the permission to digitize its imagery and produce derivative data via “third-party software” and use it for non-commercial purposes and in particular for OSM. Therefore, in order to avoid conflicts with our own commercial product and prevent all possible issues with OSM license, we had to process Russia from scratch, this time using Mapbox imagery. On the one hand, many thanks to Mapbox for the active support of neural network developers. On the other, their satellite mosaic for Russia is a patchwork of disparate quality 😢 😢 😢. For some regions of Russia, it only has low-resolution Landsat imagery (~14 zoom), and even if more high-res imagery starts showing up once you zoom in, it will often be winter and/or grayscale images that you’ll see.

Image for post
Image for post
An example of automatic building detection using Mapbox Satellite

Fortunately, the neural networks do not experience any emotional pain when working with low-quality imagery. It’s even able to bypass clouds and snow, and can sometimes recognize buildings even in dark panchromatic images just as it does in RGB.

Nowadays we’re used to viewing high-quality imagery on Google Maps that we take it for granted and forget that the actual unrefined imagery often comes in panchrome and/or clouded to name a few details that can make the work of cartogpaher harder. Winter images for areas like Russia is yet another challenge, — snow hides the original colors of the terrain and infrastructure so it becomes harder both for the human eye and for a neural network to recognize buildings.

Image for post
Image for post
Mapbox imagery and the detected buildings for the Republic of Chechnya, Russia.

Segmentation tackles these challenges tolerably well, but building classification (residential/commercial/etc.), which is an important in our commercial product, will likely perform poor results.

The numbers and the proposed Release plan

To test the data and stay updated, check out our GitHub repository.

We’ll be publishing the data by region, starting with those regions where we surpass the current state of the OSM by count the most (Geoalert (Free) / OSM).

Image for post
Image for post

Aggregated statistics for the regions are already there and can be accessed by the reference in the repository.

Image for post
Image for post
Top regions of the Russian Federation by the count ratio of buildings (Geoalert / OSM)

Republic of Chechnya

The first place in rating is taken by the Republic of Chechnya, a rather remote and rural region, for which only the capital city of Grozny is mapped in the OSM, while most of the other municipalities only contain administrative boundaries and main roads. Neither is the coverage better for Chechnya in the commercial maps such as Google or 2GIS or Yandex Maps, which normally have the most detailed data for Russia.

The private sector has been actively developed and changed over the past years, therefore it differs by more than two times with our premium based satellite imagery output (220 K vs 490 K).

To see how the building footprints are distributed among the municipalities of Chechnya, we queried OSM for administrative boundaries and managed to find 314 borders out of 360 declared officially. This indicates that most settlements (55%) can be uploaded to the OSM as is, without the risk of data conflicts. Here is a couple of graphs for clarity:

Image for post
Image for post

Moscow region

As you can see from the rating above even the well-covered Moscow region comes in the top of it. However the difference between the results obtained from commercial imagery, and those from Mapbox Satellite, is relatively small.

Since the mosaic of Mapbox Sattelite images has the better quality for the Moscow region than for the territories of Chechnya or Tyva, the generated dataset has less missing objects (calculated through Recall) as well as less false positives (calculated through Precision). The predicted building classes are also added (see the class_id attribute).

This dataset contains more than 2.6M features!

Getting statistics within the settlements boundaries (data from OSM), it’s gonig to turn out that approximately 9% (or 900+ settlements) do not contain any building features. Basically, these are “dachas” (small settlements) with the area smaller than 1 sq. km, but among of them there are also 3+ sq. km.
67% contain fewer in OSM than in Geoalert Open Urban Mapping. Count ratio for the total area is 2.8

Image for post
Image for post

Data Downloading, Validation and Import — what to do next

All statistics to play with can be found here. You are welcome to copy and reuse them as you wish.

All dataset can be downloaded via the link posted on the project’s Github: https://github.com/Geoalert/urban-mapping

An obvious question that arises when preparing data for imports into OSM is how to avoid data overlap conflicts. The Geoalert platform automatically merges the predicted building footprints with the current OSM data fetching it through Overpass turbo API. At this stage the algorithm compares the predicted building footprints with those presented in OSM for the given area, and if both sufficiently overlap (IoU), it replaces the model output with the one taken from OSM and merges the attributes. Such features have its attribute “is_osm” set to True and should be excluded from the import.

Image for post
Image for post
Footprints merged with those in OSM are shown in green (Kaah-Khem, Tyva)

The other question we were asked by the users — how to reduce the data size to upload it into JOSM (stand-alone OSM editor) without slowing down the application.

In the future, we look forward to extracting our data in small batches, but for now the suggested way is to use GDAL or QGIS to clip it by the smaller areas you re going to validate and import.

The OpenStreetMap community has strict rules as to what data can be imported and how it must be imported. To abide by these rules, we have created a page for our project in the OSM Wiki (https://wiki.openstreetmap.org/wiki/Geoalert_Open_Urban_Mapping).

We hope that the OSM community will help us with validation of Open Urban Mapping data according to the established rules.

As a conclusion

To use or not to use automatically generated data is always a trade-off between the desirable quality of the cartographic work and the time devoted to it. By our estimates for some cartographic tasks it can speed up the whole process ten times and more.

As we see the growing number of projects around OSM using or implementing AI assisted mapping — we will see more companies contribute and permit to use more recent and/or better quality imagery for humanitarian response and for filling gaps in the world map which still is far from its completeness.

Image for post
Image for post
OSM populated area coverage statistics. Source: https://disaster.ninja/.

More to come. Stay tuned!

References

Geoalert platform

About Geoalert platform for new applications of the Earth…

GeoAlert

Written by

GeoAlert

We apply Machine learning to automated analysis over Earth observation data

Geoalert platform

About Geoalert platform for new applications of the Earth Observation data powered by Deep learning analysis

GeoAlert

Written by

GeoAlert

We apply Machine learning to automated analysis over Earth observation data

Geoalert platform

About Geoalert platform for new applications of the Earth Observation data powered by Deep learning analysis

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store