What? Haven’t you done that already?
Nope. Last year we started a project called Urban Mapping, which was powered by our Geoalert platform, launched a demo project and published the “zero” version of the “building footprints” database for the Russian Federation. Since then, we’ve been working on creating a commercial product, validating and enriching data with semantics (building heights, addresses, etc.), and our artificial intelligence for automatic mapping is already stretching its tentacles to other countries. So we’ll be glad if you like a sneak peek into the capabilities of our platform.
But this post is about other big ideas. We’ve long been looking for contribution to Open Data, and, when it comes to Spatial, Openstreetmap (OSM) is the place to go.
Open Data for Russia — input imagery
The variety of satellite imagery mosaics that can be used for digitizing in OSM became simply gorgeous since big guys such as Maxar or Microsoft started to contribute to OSM. For the territory of Russia, OSM has more than 20 million building features and almost 10 million km of roads (as of September 2020).
But only Mapbox explicitly extended its license with the permission to digitize its imagery and produce derivative data via “third-party software” and use it for non-commercial purposes and in particular for OSM. Therefore, in order to avoid conflicts with our own commercial product and prevent all possible issues with OSM license, we had to process Russia from scratch, this time using Mapbox imagery. On the one hand, many thanks to Mapbox for the active support of neural network developers. On the other, their satellite mosaic for Russia is a patchwork of disparate quality 😢 😢 😢. For some regions of Russia, it only has low-resolution Landsat imagery (~14 zoom), and even if more high-res imagery starts showing up once you zoom in, it will often be winter and/or grayscale images that you’ll see.
Why do other companies process data based on premium imagery?
In the case of Microsoft, they use their Bing Maps. In the case of Facebook, which generates roads, started with “population density maps”, they received special permission from the satellite data providers to trace images with automatic algorithms for Openstreetmap. The company like us, that develop data-driven services, has to figure out how to build licenses into the business model.
Fortunately, the neural networks do not experience any emotional pain when working with low-quality imagery. It’s even able to bypass clouds and snow, and can sometimes recognize buildings even in dark panchromatic images just as it does in RGB.
Nowadays we’re used to viewing high-quality imagery on Google Maps that we take it for granted and forget that the actual unrefined imagery often comes in panchrome and/or clouded to name a few details that can make the work of cartogpaher harder. Winter images for areas like Russia is yet another challenge, — snow hides the original colors of the terrain and infrastructure so it becomes harder both for the human eye and for a neural network to recognize buildings.
Segmentation tackles these challenges tolerably well, but building classification (residential/commercial/etc.), which is an important in our commercial product, will likely perform poor results.
The numbers and the proposed Release plan
To test the data and stay updated, check out our GitHub repository.
We’ll be publishing the data by region, starting with those regions where we surpass the current state of the OSM by count the most (Geoalert (Free) / OSM).
Aggregated statistics for the regions are already there and can be accessed by the reference in the repository.
Republic of Chechnya
The first place in rating is taken by the Republic of Chechnya, a rather remote and rural region, for which only the capital city of Grozny is mapped in the OSM, while most of the other municipalities only contain administrative boundaries and main roads. Neither is the coverage better for Chechnya in the commercial maps such as Google or 2GIS or Yandex Maps, which normally have the most detailed data for Russia.
The private sector has been actively developed and changed over the past years, therefore it differs by more than two times with our premium based satellite imagery output (220 K vs 490 K).
To see how the building footprints are distributed among the municipalities of Chechnya, we queried OSM for administrative boundaries and managed to find 314 borders out of 360 declared officially. This indicates that most settlements (55%) can be uploaded to the OSM as is, without the risk of data conflicts. Here is a couple of graphs for clarity:
As you can see from the rating above even the well-covered Moscow region comes in the top of it. However the difference between the results obtained from commercial imagery, and those from Mapbox Satellite, is relatively small.
Since the mosaic of Mapbox Sattelite images has the better quality for the Moscow region than for the territories of Chechnya or Tyva, the generated dataset has less missing objects (calculated through Recall) as well as less false positives (calculated through Precision). The predicted building classes are also added (see the class_id attribute).
This dataset contains more than 2.6M features!
Getting statistics within the settlements boundaries (data from OSM), it’s gonig to turn out that approximately 9% (or 900+ settlements) do not contain any building features. Basically, these are “dachas” (small settlements) with the area smaller than 1 sq. km, but among of them there are also 3+ sq. km.
67% contain fewer in OSM than in Geoalert Open Urban Mapping. Count ratio for the total area is 2.8
Data Downloading, Validation and Import — what to do next
All statistics to play with can be found here. You are welcome to copy and reuse them as you wish.
All dataset can be downloaded via the link posted on the project’s Github: https://github.com/Geoalert/urban-mapping
An obvious question that arises when preparing data for imports into OSM is how to avoid data overlap conflicts. The Geoalert platform automatically merges the predicted building footprints with the current OSM data fetching it through Overpass turbo API. At this stage the algorithm compares the predicted building footprints with those presented in OSM for the given area, and if both sufficiently overlap (IoU), it replaces the model output with the one taken from OSM and merges the attributes. Such features have its attribute “is_osm” set to True and should be excluded from the import.
The other question we were asked by the users — how to reduce the data size to upload it into JOSM (stand-alone OSM editor) without slowing down the application.
In the future, we look forward to extracting our data in small batches, but for now the suggested way is to use GDAL or QGIS to clip it by the smaller areas you re going to validate and import.
The OpenStreetMap community has strict rules as to what data can be imported and how it must be imported. To abide by these rules, we have created a page for our project in the OSM Wiki (https://wiki.openstreetmap.org/wiki/Geoalert_Open_Urban_Mapping).
We hope that the OSM community will help us with validation of Open Urban Mapping data according to the established rules.
As a conclusion
To use or not to use automatically generated data is always a trade-off between the desirable quality of the cartographic work and the time devoted to it. By our estimates for some cartographic tasks it can speed up the whole process ten times and more.
As we see the growing number of projects around OSM using or implementing AI assisted mapping — we will see more companies contribute and permit to use more recent and/or better quality imagery for humanitarian response and for filling gaps in the world map which still is far from its completeness.
More to come. Stay tuned!