A deep dive into the SpaceNet 4 winning algorithms

Augmentation approaches, loss functions, and learning objectives

Nick Weir
Nick Weir
Feb 21, 2019 · 5 min read

Note: SpaceNet’s mission is to accelerate geospatial machine learning and is supported by the SpaceNet member organizations. To learn more visit https://spacenet.ai.

The SpaceNet Challenge Round 4: Off-Nadir Building Detection Challenge hosted by TopCoder completed recently, and we’ve had an opportunity to examine the competitors’ solutions. In this post we highlight a few key differentiators that improved segmentation in the winning algorithms. For a high-level overview of the competition see our earlier post, and for a summary of the solutions see this post.

Challenges with the SpaceNet Off-Nadir Dataset

Identifying buildings in the SpaceNet 4 Off-Nadir Dataset posed a few unusual challenges. For example, building density varies dramatically across different regions of Atlanta, the city covered in the dataset. Therefore algorithms had to accommodate both dense areas and sparse areas:

Building appearance was very different across the images collected at different angles: buildings appeared “normal” in the images taken directly overhead, while they were very distorted in the off-nadir collections. Furthermore, the apparent resolution dropped significantly as look angle increased:

Finally, some collections looked directly into shadowed areas of buildings, whereas others had very bright reflections from sunlit areas:

Most computer vision tasks do not face these challenges, and “standard” computer vision algorithms don’t necessarily address them well. For example, Our baseline model was a simple implementation of a popular deep learning model for computer vision, which did a very poor job of identifying buildings in very off-nadir imagery. The competitors’ solutions improved on this baseline by almost 300% in these very off-nadir images — how did they do it? After analyzing their solutions, we think there were three key details that helped: augmentation strategy, loss functions, and learning objectives. Let’s look at how each of those helped.

Augmentation strategy

Several competitors discussed image augmentation in their solution descriptions. As in our baseline model, many competitors started out by flipping and rotating images. However, they found that these augmentations did not improve their models’ performance — in fact, it harmed their scores! After thinking more about off-nadir imagery, we think we know why. Placing building footprints correctly in off-nadir images requires not only identifying the buildings, but also accounting for distortion. In off-nadir looks, the roof of the buildings is displaced relative to their footprints on the ground:

To effectively place footprints in off-nadir images, algorithms needed to not only find the building but also to adjust for this displacement. Algorithms needed to “know” which direction the footprint was displaced — but this is hard to learn if images are rotated or flipped, changing the direction at random.

There’s an important corollary to the approach the competitors took: these algorithms are unlikely to generalize to new off-nadir looks with building roofs displaced in a different direction.

Loss functions

One major challenge for segmenting buildings in overhead imagery is their relative sparsity compared to other object types in natural photographs. The objects classified in the ImageNet dataset comprise a substantial fraction of the pixels in the image; by contrast, buildings make up fewer than 5% of the pixels in the SpaceNet Off-Nadir dataset. This poses a major challenge to segmentation algorithms because few loss functions can overcome the “all-zero valley”, as predicting that no pixels correspond to buildings yields 95% accuracy. To overcome this challenge competitors generally used a composite loss function comprising a binary cross-entropy loss variant alongside a loss function that specifically targets positive predictions: either Dice coefficient loss or Jaccard loss. The top two competitors trained their models with a composite of Dice and the relatively new Focal Loss, a binary cross-entropy variant that penalizes low-confidence predictions more strongly. These loss functions combined with the competitors’ advanced segmentation objective masks yielded high-fidelity building footprint extraction.

Objective Masks

Neural networks for segmentation are generally trained to generate “pixel masks”, which are 0–1 probability density maps denoting the likelihood that each pixel corresponds to an object class (a semantic segmentation mask). However, our building detection challenges don’t stop there: we ask competitors to generate polygons labeling every building separately, making this an instance segmentation task. This is challenging because competitors must ensure that segmentation outputs for buildings don’t contact one another, or else the algorithm may fuse them into one instance prediction:

To aid their algorithms in learning building separation, many competitors added one to two additional channels to their neural net objectives:

  1. Building outline labels,
  2. Contact points between very closely juxtaposed buildings

These two additional channels are roughly equivalent to providing additional classes for the algorithm to learn.

Combining these three channels shows the layout of the objective that the competitors trained their algorithms to predict:

In post-processing, competitors subtracted the outline and contact regions and used watershed algorithms to separate very nearby buildings. Notably, this approach did not work for everyone: the 5th place competitor, XD_XD, indicated that labeling contact points did not improve his algorithm.

Conclusion and looking forward

Competitors used their loss functions, learning objectives, and augmentation strategy to address the unique challenges posed by the SpaceNet Off-Nadir Buildings Challenge task and data. They cut common image augmentations (rotation and flipping) from their pipeline so their algorithms could learn offset. They used loss functions optimized for a low foreground-to-background class ratio to ensure algorithms learned to find the relatively uncommon building pixels. Finally, they used advanced learning objectives to effectively separate buildings for this instance segmentation task.

Though this deep dive covered many details of how competitors identified buildings in off-nadir imagery, it doesn’t cover everything. In the next post, we will look at where algorithms performed well and where they failed: did different competitors’ models miss the same buildings, or did each model have unique failures? What would have happened if we had set different thresholds for building segmentation — for example, an IoU cutoff of 0.75 instead of 0.5? What types of objects yielded false positive predictions in the competitors’ solutions? For this and more, follow us at https://medium.com/the-downlinq and on Twitter @CosmiQWorks and @NickWeir09!

The DownLinQ

Welcome to the archived blog of CosmiQ Works, an IQT Lab

The DownLinQ

As of March 2021, CosmiQ Works has been folded into IQT Labs. An archive will remain here to showcase historical work from CosmiQ Works that took place July 2016 — March 2021.

Nick Weir

Written by

Nick Weir

Data Scientist at CosmiQ Works and SpaceNet 4 Challenge Director at the SpaceNet LLC. Advancing computer vision and ML analysis of geospatial imagery.

The DownLinQ

As of March 2021, CosmiQ Works has been folded into IQT Labs. An archive will remain here to showcase historical work from CosmiQ Works that took place July 2016 — March 2021.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store