The SpaceNet Challenge Off-Nadir Buildings: Introducing the winners

SpaceNet’s mission is to accelerate geospatial machine learning and is supported by the SpaceNet LLC member organizations.

The SpaceNet Challenge: Round 4, Off-Nadir Building Footprint Extraction hosted by TopCoder is complete! Congratulations to cannab, selim_sef, MaksimovKA, number13, and XD_XD for taking home the top 5 prizes. Check out this GitHub repository for their solution algorithm code. Their solutions represented a 1.5-fold improvement over our initial baseline model’s performance. This is the first post in a series where we’ll dig into the competitors’ solutions to find out how they applied machine learning to identify buildings in overhead imagery taken at look angles from 7 to 54 degrees off-nadir.

The Challenge

In Round 4 of the SpaceNet Challenge we asked competitors to identify building footprints from 27 different satellite collects taken during a single pass of a WorldView 2 satellite over Atlanta. The data comprised both South-facing and North-facing collects with look angle ranging from 7 degrees off-nadir to a whopping 54 degrees off-nadir. For more details about the challenge (or if you’re unfamiliar with that terminology), check out these earlier posts describing the competition:

Introducing the SpaceNet Off-Nadir Imagery Dataset

A Baseline Model for the SpaceNet Off-Nadir Building Detection Challenge

Challenges with the SpaceNet 4 off-nadir satellite imagery

We had 238 registrants and 21 competitors beating our baseline model in this extremely challenging competition.

Key takeaways

Before we get into the nitty-gritty, here are our main takeaways from running and scoring the challenge:

Building detection in “ideal” imagery is still far from perfect

In these perfectly cloudless, high-resolution Worldview 2 images over Atlanta, even when we restrict to the images looking nearly straight down (look angle of <25 degrees off-nadir, the “nadir” set), the best algorithm still missed about 21% of buildings.

The winning algorithm correctly identified about eight out of ten buildings in images taken at nadir.

Furthermore, about one out of every six buildings the winner predicted in the nadir set did not match up to a building on the ground (i.e. was a “false positive”).

For every six predictions the winner made from nadir images, roughly five corresponded to buildings.

There’s still a lot of room for improvement in automated analysis of overhead imagery!

It is very hard to identify buildings at very off-nadir look angles

Performance at look angle >40 degrees off-nadir was an average of 30% worse than <25 degrees off-nadir. The winners only positively identified about 50% of the buildings in these very off-nadir images:

The winning algorithm correctly identified about five out of ten buildings in images taken very off-nadir.

…and about one in three of their predictions were false positives:

For every six predictions the winner made from very off-nadir images, about four corresponded to buildings.

If you want to use algorithms to identify objects in very off-nadir imagery, be aware of these performance limitations.

Big ensembles of models help, but the juice may not always be worth the squeeze

The winning algorithm, an ensemble of 28 convolutional neural networks (CNNs) with a gradient boosting machine (GBM) filter to remove bad buildings yielded only a 5% improvement over the #5 algorithm, which comprised only 3 CNNs. This came at a substantial computing cost: the winning algorithm takes more than 10 times longer to identify buildings in new images than the 5th place competitor’s algorithm! With a difference of 6.5 vs. 68 seconds per square km of prediction, this may represent the difference between a deployable solution and an algorithm that is only academically relevant.

Let’s dive into the details!

Winners’ algorithm performance

A summary of the winners’ results alongside the baseline model are in the table below.

Competitors’ scores in the SpaceNet 4: Off-Nadir Building Detection Challenge compared to the baseline model. Each score represents the SpaceNet Metric for the entire image set (Overall) or subsets of the imagery with specific look angles: Nadir, 7–25 degrees; Off-nadir, 26–40 degrees; Very off-nadir, >40 degrees.

The results were remarkably close — cannab’s winning solution represents only a 5% improvement over XD_XD’s 5th place solution. Much of the competitors’ improvements over the baseline came in the off-nadir and very off-nadir image subsets, where the winning solutions represented 79% and 170% improvements over the baseline model. By contrast, the algorithm with the best performance in the nadir set only improved on the baseline by 32%:

Improvement in competitor algorithm over baseline, stratified by look angle. Nadir: 7–25 degrees, Off-nadir: 26–40 degrees, Very off-nadir: >40 degrees.

So, how did the competitors achieve these improvements? This will be the topic of the remainder of this post and several deep dives to come.

A summary of the winners’ algorithms

Every top 10 competitor who submitted a solution used convolutional neural networks (CNNs) to identify building footprints. Each competitor produced a binary output image with each pixel labeled as 1 (contains a building) or 0 (does not contain a building), which they then converted to polygons outlining individual buildings. The intricacies of the competitors’ algorithms, summarized below, varied dramatically.

A summary of winners’ algorithms. See the end of the post for references. *: Training time and inference time were measured on a server with 16 Intel Xeon 3.5 GHz CPUs, 4 Titan Xp GPUs with 12 GB GPU memory each, 256 GB of RAM, and a 2 TB SSD for data storage. Inference was run using only one of the four GPUs.

Let’s go through a couple of details of the competitors’ solutions:

Neural net architectures

The vast majority of competitors’ algorithms performed semantic segmentation using ensembles of U-Nets with a variety of state-of-the-art image classification encoders. Number13 used Mask-RCNN, a combined object detection/classification/segmentation algorithm.

Algorithm ensembles

Every competitor in the top 5 used an “ensemble” of CNNs, i.e., they trained multiple models and combined the results for their final output. Every top-5 competitor except for number13 trained every model on imagery from all of the collects combined, and then averaged the results from all of the models. By contrast, number13 trained a separate model to identify building footprints in imagery from each collect. These approaches seem to be similarly performant, though each have their downsides: large averaging ensembles take much longer for inference, whereas the one-model-per-collect approach is less likely to generalize well to new imagery.

Performance vs. prediction time: a trade-off

The results of this challenge highlight the cost-performance tradeoff associated with using very large ensembles of very deep CNNs: cannab’s winning code was carefully optimized to squeeze training and inference using as many CNNs as possible into the training and inference periods provided, an effort that was rewarded with the first place prize. However, this algorithm needs a lot of computing time to identify buildings in new imagery — 68 seconds per square kilometer. This means that the algorithm would need about 5–6 hours to identify all of the buildings in a single WorldView 2 collect. By contrast, the #5 solution provided by XD_XD could do the same in approximately 25–30 minutes with only a 5% drop in fidelity. An awareness of this balance is essential when considering model selection for a deployable product.

Competitor code

The competitors’ solutions can be found in this GitHub repository. Stay tuned for their own summaries of the code and instructions on how to download their trained models for use in inference.

There’s more coming soon!

We’ll be back shortly with another post that digs deeper into the advances these algorithms represent over winners of past SpaceNet Challenges, including input data selection, model training details (loss functions, training objectives, cross-validation), and post-processing strategies. Stay tuned, and follow us on twitter @CosmiQWorks and @NickWeir09 for more!

References:

  1. https://arxiv.org/abs/1505.04597
  2. https://arxiv.org/abs/1709.01507
  3. https://arxiv.org/abs/1707.01629
  4. https://arxiv.org/abs/1409.0575
  5. https://arxiv.org/abs/1512.03385
  6. https://arxiv.org/abs/1608.06993
  7. https://arxiv.org/abs/1602.07261
  8. https://arxiv.org/abs/1612.03144
  9. https://arxiv.org/abs/1703.06870
  10. https://arxiv.org/abs/1405.0312
  11. https://www.crowdai.org/challenges/mapping-challenge
  12. https://arxiv.org/abs/1409.1556
  13. https://arxiv.org/abs/1801.05746