Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint & road network detection). SpaceNet is run in collaboration with CosmiQ Works, Maxar Technologies, Intel AI, Amazon Web Services (AWS), Capella Space, Topcoder, and IEEE GRSS.
The SpaceNet 6 Challenge is the first multimodal SpaceNet challenge, inviting participants to work with optical imagery as well as synthetic aperture radar imagery to design a foundational mapping algorithm. This post describes a newly-released baseline algorithm, which could be used as an example or starting point for SpaceNet 6 participants.
The goal of the challenge is to produce an algorithm that takes a synthetic aperture radar (SAR) image and returns a list of building footprints (i.e., outlines), expressed as vector polygons instead of just pixel maps. To enable supervised learning methods such as deep learning, training data has been provided. The training data includes tiles of SAR imagery, along with the corresponding building footprints for each tile. The training data also includes optical imagery of those same tiles. Although it’s possible to train a SAR→footprint algorithm without using the optical data at all, having it available builds intuition and opens up additional possibilities for how to approach this problem.
The baseline algorithm presented here is built using Solaris, an end-to-end Python framework for geospatial deep learning. The code for this baseline can be found on CosmiQ’s Github under “CosmiQ_SN6_Baseline.” In some ways this baseline draws on the baseline and winning submission from SpaceNet 4. However, it also has features that are specific to the new dataset. SAR imagery is affected by the direction from which the data was collected (through effects such as layover). Therefore, the baseline rotates tiles so the model is trained on SAR images of matching directionality. Also, the baseline takes advantage of the optical imagery through transfer learning. Training the model on the optical imagery first and the SAR imagery second leads to higher performance than training the model on the SAR imagery alone. Lastly, since small buildings (<20m²) are not used for evaluation, they are removed from the training data. This baseline earns a SpaceNet metric score of just above 0.2, and we look forward to seeing challenge participants improve on that score by trying a variety of techniques.
At the heart of the baseline algorithm is a neural network, trained to take an image and return a pixel segmentation map. The neural net used here is a U-Net with a VGG-11 encoder. This architecture, also called a TernausNet, is the same one used for the SpaceNet 4 baseline. The only difference here is that the neural network takes four-band images as input. That’s because our quad-pol SAR imagery contains four polarization channels.
To make transfer learning as simple as possible, the pan-sharpened RGB optical imagery is converted into four-channel images so it can be treated as ersatz SAR imagery. (Red is used for the HH polarization, green for VV, and blue for both HV and VH.) That way, the model can be trained on the optical and then the SAR with no swapping out of layers required.
Training is done for 150 epochs on optical imagery followed by 50 epochs on SAR imagery. Optical imagery is augmented by random rotations of 0, 90, 180, or 270 degrees, but no such augmentation is done with the SAR data because of the directionality mentioned above. About 1/5 of the training data is set aside for validation, with this data being drawn from the easternmost part of the training data region. Instead of just using the model weights after the final epoch, the model weights that result in the highest performance on the validation data are used. Because the model is susceptible to overtraining with this data, the “best” version sometimes comes well before the final epoch.
Training is done with an AdamW optimizer. Model performance is evaluated using a loss function that combines Dice and focal loss, with the latter being given ten times the weight of the former. This choice of optimizer and loss function is based on the winning SpaceNet 4 algorithm. Training is completed in about ten hours on 4 Titan Xp GPUs; testing in about half an hour.
It should be emphasized that there are many ways to approach this challenge, with the given baseline providing only one such approach. For example, a completely different way to use the optical data is to train a deep learning model to convert SAR images to optical-style images (see here and here), effectively turning a SAR→footprint task into an optical→footprint task.
Using the SpaceNet metric, the baseline model achieves a score of 0.21±.02 for the SpaceNet 6 Challenge. If transfer learning with optical data is not used, the score drops to 0.135±.002. If transfer learning is not used and furthermore the SAR imagery is augmented with random rotation (thereby failing to respect directionality), the scores sinks slightly lower to 0.12±.03. During training, the lowest loss when training on optical data was almost three times smaller than the lowest loss training on SAR, suggesting that interpreting SAR is a fundamentally harder problem than optical imagery, at least when the optical imagery is nearly on-nadir. Several years ago, the baseline for the very first SpaceNet challenge, which concerned finding building footprints from optical imagery, clocked in at a score of 0.21, roughly equal to the current baseline for SAR. The results of that first challenge, and subsequent building footprint challenges with optical data, saw ever-increasing performance. If this trend holds true for the realm of SAR, then achievable performance may far exceed what’s shown in this baseline.