Building footprint comparison in Cambodia: Orbital Insight and OpenStreetMap
Azavea and Orbital Insight are proud to announce the release of a map that shows the difference between two building footprint datasets that cover all of Cambodia: (1) building footprints from OpenStreetMap (OSM) and (2) building footprints generated by Orbital Insight’s artificial intelligence/machine learning (AI/ML) analysis of satellite imagery.
View the building footprint comparison.
The technical objective of this analysis is to identify mismatching clusters of geometries and to determine where OSM is deficient relative to AI/ML-detected buildings and vice versa. While the open mapping community is increasingly active in providing high-confidence updates to foundation mapping data around the world, AI/ML allows analytics to keep pace with the ever-increasing supply of satellite imagery and indicate where dedicated, human-centric feature updates are most needed.
Some of the underlying goals for the analysis were to: (1) show an interesting test case for Azavea’s open source library, VectorPipe, and (2) create a visually appealing demonstration of Orbital Insight’s capabilities.
- Pull the latest OSM PBF extract for Cambodia and convert it to an ORC file with the osm2orc utility
- Do a three part check to identify building overlap:
a. Remove any combination of non-intersecting geometries in both data sets
i. This step vastly reduces the number of actual computations that need to be performed
b. Check if Orbital Insight’s building footprints contain the centroid of an OSM footprint
i. We check for Orbital Insight’s building footprints containing OSM centroids. In general, Orbital Insight’s data have larger areas and can contain a number of tightly packed buildings. Think apartment or office complexes. In developer speak, there’s effectively a one-to-many relationship for AI/ML footprints to OSM footprints even though ideally it would be a one-to-one relationship.
c. For each pair of unique geometries in each set that have passed the intersection check but failed the centroid check, compute the intersection of the two geometries. If the ratio of the intersection area to the AI/ML geometry area is above some threshold(*), assume a match.
i. (*) We used a ratio of 0.75
ii. This check is primarily responsible for catching buildings where the centroid is not actually within the building footprint. Think “U” shaped buildings or buildings with a central open courtyard.
iii. We do this check last, and first attempt a centroid check, because computing intersections is a much more expensive computation. We aim to perform as few of these computations as possible.
iv. In practice, we found that the intersection is computed for less than 10% of the geometries that passed the first intersection check.
3. Generate a vectorTile layer that is a union of the two datasets. It contains the following attributes in addition to the geometry from the relevant layer:
a. “source”: One of “osm”, “oi” or “both” indicating which datasets this geometry was detected in
b. “name”: The value of the OSM tag “name” for this geometry, if applicable
c. “buildingType”: The value of the OSM tag “building” for this geometry, if applicable
4. Visualize using Mapbox GL JS.
Orbital Insight, a geospatial data analytics company, uses computer vision techniques to segment land use and classify objects from satellite imagery at scale. In this case, Orbital Insight’s Land Use classification algorithm automatically derives building footprints from PlanetScope 3–5m resolution imagery, among other classes such as roads, natural forest, planted forest, mines, water, agriculture, and more not displayed here. This data is used by NGOs, Fortune 500 companies, national governments, and the international development community to study urbanization, conduct large scale land mapping projects, and monitor the drivers of deforestation up to country-wide scale.
- In practice, there was not a 1-to-1 mapping of geometries between the datasets. In an ideal world with perfect AI/ML detection there might be, but we’re not there yet.
- Geometric computations are expensive and we needed to do a lot of them. There was some measure of trial and error before we came up with the multi-step checks described above.
- Future work doing large scale vector data comparisons
- Understanding the opportunities and challenges for relying on OpenStreetMap data
- Use AI/ML as a first pass for identifying regions with incomplete data to inform future mapping campaigns