Challenges of large open-source datasets for building detection in Africa

Tips, tricks and warnings when creating a building detection machine learning model using open-source reference data.

Published in

Sentinel Hub Blog

12 min readApr 18, 2023

Written by Sara Verbič. Work performed by Sara Verbič, Devis Peressutti, Nejc Vesel, Matej Batič, Žiga Lukšič, Jan Geršak, Matic Lubej and Nika Oman Kadunc.

We wanted to create a large training dataset for automated building detection in Africa, so we reviewed open-source datasets for that purpose. Machine-generated datasets lack the accuracy to be used as reference training data and are highly dependent on the satellite imagery they were inferred from. Manual labeling on the desired target satellite imagery remains the most accurate option to create ML-ready training datasets, although very laborious and expensive.

World population and settlements continue to grow rapidly and much of this transition is particularly noticeable in developing countries. Identifying the locations and footprints of buildings provides data for various practical and scientific purposes, such as population mapping, urban management, and environmental sciences. This data is particularly valuable in developing regions, where alternative data sources, especially from local authorities, may not be available.

Different methods for estimating the location and extent of buildings are viable. Although very accurate, the manual processing of an aerial/satellite image is not scalable — it’s time-consuming and laborious. Machine learning (ML), computer vision and remote sensing have come a long way in automatically and reliably delineating buildings, in part due to the increasing availability of very high-resolution imagery (with a spatial resolution of less than 1 m). But open-source imagery with sufficient resolution, which is usually 50 cm, is typically only available for a small number of locations around the world. This presents a challenge as it’s crucial to ensure the training dataset is geographically diverse and includes a variety of rural and urban locations with different building styles. However, not even ML methods coupled with very high-resolution imagery, such as imagery provided by Airbus Pleiades and Maxar WorldView, are sufficient to derive accurate estimates of building footprints. This is due to limitations of the input imagery, mostly due to the challenges posed by optical satellite image acquisitions, as shown in Figure 1. In addition, differences in spectral, temporal and spatial resolution between satellite image providers need to be taken into consideration when choosing the appropriate target imagery.

Fig 1. Examples of different acquisition conditions for the same location for Maxar WorldView imagery (WorldView © 2020 MAXAR Technologies). The examples show differences in sun azimuth and elevation angle, and satellite viewing angle, which affect how buildings are depicted and the shadows they cast. ML models might struggle to deal with such variations in appearances.

On the other hand, we need to bear in mind that reference labels used in ML may be subject to errors. In addition, we need to admit that identifying buildings remains a difficult and challenging task in many scenarios, considering:

geological and vegetation features that can be confused for built-up structures;
areas characterised by small buildings, which can appear only a few pixels wide at this resolution;
buildings constructed with natural materials that tend to blend in with the surrounding rural or desert areas;
clusters of buildings that are very close together may not be easily identified.

Such scenarios are common in Africa, which accounts for approximately 20% of the Earth’s total land area and presents a wide range of terrain and building types. Africa’s scarcity of reference data makes the validity of the building footprints all the more important.

In the following, we will focus on reviewing a few data sources and labels openly available for Africa, which are Microsoft’s Building Footprints (MBF [1]), Google’s Open Buildings (GOB [2]) and the Replicable AI for Microplanning (RAMP [3]) datasets. We review such datasets with the intent of using them as reference labels for our Hierarchical Detector (HIECTOR) [4]. As we are interested in running HIECTOR on the entire continent of Africa, we will be looking at different regions and areas. If you are interested in the topic of building segmentation, we recommend the excellent overview provided in Azavea’s blog-post series [5]. We are also aware of and have looked at other open-source datasets, such as the SpaceNet challenge datasets [6], and the manual building footprints available on Radiant Earth MLHub [7]. However, these datasets have limited spatial coverage, which limits their use as training datasets for predicting buildings on the entire continent. An additional resource of open-source labels worth mentioning is the Datasets section of the curated Satellite Imagery Deep Learning repository [8].

As many things, the analysed data sources have their pros and cons. Their main advantage is that they are accessible and can often be used under permissive licences, suitable for both academic and commercial applications. Typically, the built-up areas are generally detected, but the accuracy of building footprints’ polygons significantly differs between datasets and locations. More on that later! If we focus on the former, the MBF dataset does not cover the whole of Africa, meaning that building footprints are missing for certain areas. They did not process imagery if tiles were dated before 2014 or had a low probability of detection [8]. The drawback of both MBF and GOB datasets is that the imagery acquisition dates are unknown. GOB predictions were created in August 2022 [2], but the most recent image for some locations was at that time several years old or not available at all, and in the dataset there is no information about the year of acquisition of the used satellite imagery. The MBF dataset carries image acquisition date attribute for each building footprint, if they could deduce the vintage of the imagery used. However, our locations of interest did not have such information. All that is known is that imagery used is from Bing Maps, including Maxar and Airbus imagery taken between 2014 and 2022 [9]. The lack of this information makes working with data much more difficult, as it is not possible to interpret the data on the appropriate underlying imagery. The same must be taken into consideration when assessing the quality of building footprints’ labels, because datasets do not necessarily reflect the state of the (latest) underlying satellite imagery. In our case, this applies to MBF, GOB and HIECTOR datasets.

To see how polygons of building footprints are different between datasets and types of locations of interest we will pay attention to 4 locations of interest in different regions of Africa. They were chosen to represent different types of settlements — rural, urban, and high-rise. As an additional point of comparison, we will also incorporate labels detected using HIECTOR, which are for the moment only available for Dakar, Senegal.

Intermezzo: building footprint vs rooftop

A general definition for building footprint is a polygon, or set of polygons, representing a specific building in the physical world, providing a ground-centred representation of a building’s location, shape, dimensions, and area [10]. Getting all this information from overhead satellite imagery might not be possible, so often algorithms provide an approximation of the footprint, depending on the image acquisition conditions and shape of the building. For instance, for some of the high-rise buildings shown here, shadows and orientation of the building occlude the actual building footprint. In other cases, like for terraced houses or blocks, the separation of building footprints does not correspond to physical visible features. For this reason, some automated algorithms are more successful in detecting and delineating building rooftops rather than the actual footprint.

Comparison

For a fair comparison, we visualise the datasets on the corresponding satellite imagery used to infer the building footprints. Comparison across different satellite imagery is challenging due to differences of the image acquisition conditions and processing. However, despite the challenges, the estimated building footprints should provide a reliable estimate of the actual building position regardless of the image it was derived from.

Below you will find image examples taken from larger areas of interest (AOI) that we investigated. Red bounding boxes surrounding buildings represent predictions of Microsoft’s Building Footprints dataset with Bing maps as underlying satellite imagery. Green bounding boxes represent Google’s Open buildings dataset, and its underlying imagery is Google Earth satellite imagery. RAMP predictions are marked with yellow bounding boxes, where detection results were obtained using the high resolution Pleiades imagery. Unlike the MBF and GOB datasets, the RAMP project provides the model and excellent instructions to derive the footprint polygons for any AOI. Additionally, location of interest in Dakar includes detections from HIECTOR, which are marked with blue bounding boxes and were also obtained using Pleiades imagery.

Fig 2. Bounding box colour and underlying satellite imagery source for each dataset.

Serrekunda — Gambia

What first catches the eye is how the building footprints obtained through the RAMP prediction model have a distinctively amorphous shape, lacking well-defined edges, which consequently does not accurately represent the ground truth. Another limitation of RAMP is the incomplete extraction of larger buildings, as well as the inability to fully encompass the visible structure depicted in the satellite imagery, visible in some building footprints. This is of course a limitation of the model and not of the imagery, and it is likely due to a lack of generalization to new areas. One issue observed for GOB in all locations is due to partial buildings being predicted, probably due to the stitching of satellite tiles from different acquisition takes.

The MBF dataset tends to be less precise when detecting large-sized buildings, as these structures are frequently combined into a single block of polygons, resulting in a loss of detail. This factor should be considered particularly in densely populated areas where attached buildings are common. The RAMP prediction model is subject to a similar issue, but with smaller-sized buildings.

Fig 3. Example location from Serrekunda, Gambia. Images show the considered open-source dataset overlaid onto the imagery used for their inference for a fairer comparison. Top-left, MBF dataset shown in red on Bing Maps imagery. Top-right, GOB dataset shown in green on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Cairo — Egypt

Areas featuring tall buildings generally exhibit a higher accuracy due to their orderly designs. Such scenario is observable in districts in Cairo. However, despite the homogeneity of structures, certain building footprints in the GOB dataset are fragmented. This can be attributed to the presence of smaller constructions situated on the rooftops of high-rise buildings. Such structures have diverse reflection characteristics and varying roof heights, leading to their identification as individual buildings. On the contrary, MBF groups high-rise buildings together in the same bounding box, despite their clear separation. The GOB dataset is also subject to a challenge with accurately representing high-rise buildings, in this specific location their bounding boxes exhibit variation in their delineation. Specifically, certain bounding boxes capture the outline of the building’s roof, while others delineate the outline of the structure on the ground.

Fig 4. Example location from Cairo, Egypt. Images show the considered open-source dataset overlaid onto the imagery used for their inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Gatumba — Bujumbura

Quickly changing scenery is not a rare occurrence in Africa, therefore datasets and satellite imagery do not always reflect current state of the area of interest. This is highlighted by the observed differences in detections and temporal diversity of underlying satellite imagery in the following comparison. As previously mentioned, it is unclear which specific dates the MBF and GOB refer to, which can create difficulties in utilizing these two datasets. A notable issue is that the model fails to detect numerous objects, including both smaller objects in the north and larger objects in the south of the location of interest. This presents a challenge to the accuracy and reliability of the model.

Fig 5. Example location from Gatumba, Bujumbura. Images show the considered open-source dataset overlaid onto the imagery used for their inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS). In this example, large difference of land cover can be seen between images, making it difficult to assess the temporal veracity.

Modderspruit — South Africa

The limitations of outdated datasets and inaccuracies in building detection models are clearly evident in this particular location of interest. The MBF dataset only includes buildings that were present before the year 2017, while the RAMP prediction model shows significant inaccuracies in detecting buildings in this location, with a large number of buildings going undetected and several false detections of larger size.

Upon comparing the number of detections across the datasets, it is evident that the GOB dataset stands out with a higher number of smaller-sized detections, some of which may not actually be buildings, but rather rocks or vegetation. Detections in the dataset are already filtered and include only those with confidence score of 0.6 or greater. Google recommends filtering the detections based on confidence scores to achieve a desired precision level depending on the application. The dataset quality varies per location, and Google provides a CSV file with suggested score thresholds to obtain the recommended precision level for each download tile.

Fig 6. Example location from Modderspruit, South Africa. Images show the considered open-source dataset overlaid onto the imagery used for their inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Dakar — Senegal

In Dakar, we selected an urban area of interest, where buildings are densely positioned in close proximity to one another. Upon comparing datasets, we have observed that HIECTORs detections have been the most comprehensive. However, there is still much room for improvement, as some of the bounding boxes overlap and there are some false detections, such as parking spaces and random sections of roads. RAMP prediction model was largely unsuccessful in extracting individual building footprints. Most of the detected footprints contain multiple buildings, which poses a significant challenge for accurate analysis and evaluation of the dataset. The MBF dataset also presents a comparable challenge, albeit with fewer such instances observed. In addition, its main disadvantage is that plenty of buildings were not detected. Analysing the GOB has proven to be challenging due to a different viewing angle of the underlying satellite imagery. However, high frequency of smaller-sized detections remains a persistent issue.

Fig 7. Example location from Dakar, Senegal. Images show the considered open-source dataset overlaid onto the imagery used for their inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, HIECTOR predictions shown in blue on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Discussion

The above review was carried out with the purpose of reviewing open-source building footprint datasets for their use as training dataset over large AOI, i.e., Africa, for our own building detection model HIECTOR. The presented quality assessment might not apply in other use-cases, for instance for a rough estimation of buildings in a given area. However, for our use-case, we feel like providing the following tips and warnings:

Consider in advance which satellite imagery will be used as base layer, and be aware about the variations brought in by different acquisition conditions, particularly if you plan to use multiple sources of imagery.
Manually labelled or validated building footprints provide the most accurate estimation of building footprints, although their spatial coverage is very limited. Make sure to check the open-source datasets for manually labelled data.
If you target large areas and manually labelled footprints are not an option, consider machine-generated datasets. However, the accuracy and coverage of machine-generated building footprints greatly varies across regions, so make sure to evaluate their accuracy using the target imagery of choice.
Although machine-generated datasets might not be accurate enough to be used as training labels, they might present a good starting point to speed up manual labelling and validation. This, again, depends on the region and on the complexity of the buildings and landscape being depicted.

Conclusion

Accurate and up-to-date building footprint data is crucial for various practical and scientific purposes. New technologies have made it possible to automatically delineate buildings. However, limitations of the input imagery and reference labels still pose challenges, particularly in developing areas where accurate data may be scarce. To address this issue, we explored various open-source datasets available for Africa. We pointed out some of the cons and showed that the quality of the datasets varies from location to location and believe it is very important to evaluate the suitability and limitations of these datasets for specific regions and applications. Further efforts are needed to improve the accuracy and coverage of such datasets, but nevertheless, they provide a promising path towards more accurate and comprehensive building footprint data, especially for regions where alternative data sources may not be available.

References

[1] https://www.microsoft.com/en-us/maps/building-footprints

[2] https://sites.research.google/open-buildings

[3] https://rampml.global/

[4] https://github.com/sentinel-hub/hiector

[5] https://www.azavea.com/blog/2022/10/26/automated-building-footprint-extraction-open-datasets/

[6] https://spacenet.ai/datasets/

[7] https://mlhub.earth/datasets?tags=building+footprints

[8] https://github.com/satellite-image-deep-learning/datasets

[9] https://github.com/microsoft/GlobalMLBuildingFootprints

[10] https://www.safegraph.com/blog/building-footprint

The project has received funding from European Union’s Horizon 2020 Research and Innovation Programme” under the Grant Agreement 101004112, Global Earth Monitor project.