The SpaceNet 5 Baseline — Part 2: Training a Road Speed Segmentation Model

Published in

The DownLinQ

5 min readOct 9, 2019

Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint & road network detection). SpaceNet is run in collaboration with CosmiQ Works, Maxar Technologies, Intel AI, Amazon Web Services (AWS), Capella Space, and Topcoder.

The SpaceNet 5 Challenge seeks to determine road networks and route travel times directly from satellite imagery. To lower the barrier of entry for this rather complex challenge, our previous post walked readers through requisite data preparation steps. This post utilizes that prepped data to train a deep learning segmentation model to decipher roads and speed limits from SpaceNet 5 data. We use the the City-scale Road Extraction from Satellite Imagery (CRESI) framework for our baseline road detection model, which is tuned to detect roads from satellite imagery. For a more general purpose segmentation approach to remote sensing imagery, see the Solaris package.

1. Dataset

This section briefly summarizes Part 1 of this series. SpaceNet data can be freely downloaded from AWS, with detailed instructions at spacenet.ai. Once the data is dowloaded, we install the CRESI framework, and fire up a docker container. Data can then be prepared via the scripts from Part 1 of this series, yielding something akin to Figure 1.

Figure 1. Left: sample training image in Mumbai. Middle: typical binary training mask. Right: multi-channel training mask, where red corresponds to 21–30 mph, green corresponds to 31–40 mph, and blue corresponds to 41–50 mph.

2. Setting up the JSON

CRESI uses a JSON file to store parameters for training and inference, we will use the sn5_baseline.json file.

Figure 2. Snippet of an example JSON file.

The first step is to replace the sample paths above with the appropriate paths to the locations of the CRESI codebase and the training dataset, e.g.

"path_src": "/path/to/cresi/cresi",
"path_data_root": "/path/to/data/cresi_data/",

For this model we use 3-band RGB imagery, and 8 unique output channels, hence:

"num_channels": 3,
"num_classes": 8,

Using multiple cross-validated models can improve performance, though at the expense of increased run time at inference. For our baseline we use only a single fold, and randomly withhold 20% of the data for validation purposes during training. We use this validation data to test performance after each epoch, and truncate training if the validation loss does not decrease for 8 epochs.

"num_folds": 1,
"default_val_perc": 0.20,
"early_stopper_patience": 8,

Finally, we define our network. The winning implementation of SpaceNet 3, submitted by competitor albu used ResNet34 as the encoder, with a U-Net like decoder, and we adopt a similar architecture (in fact, albu’s SpaceNet 3 submission forms much of the framework for the CRESI codebase). We adopt a custom loss function comprised of 25% Dice Loss and 75% Focal Loss, and train for 70 epochs:

"network": "resnet34",
"loss": {"soft_dice": 0.25, "focal": 0.75},
"nb_epoch": 70,

There are a number of other variables in the JSON file that can be altered to enhance training and testing, though we’ll leave the remainder alone for now.

3. Training a Model

We assume that the reader has built the CRESI docker container (see Part 1 for instructions), which we now open on a GPU-enabled machine:

docker attach cresi_container

With the docker container running, we can execute python commands. First, we need to create our cross-validation folds, or in our case with just a single fold we split our data into training/validation sets:

python /path/to/cresi/cresi/00_gen_folds.py jsons/sn5_baseline.json

Now we have all the pieces in place to kick off training:

python /path/to/cresi/cresi/01_train.py jsons/sn5_baseline.json --fold=0

Let’s train for 10 epochs (which takes 10 hours on a single Titan X GPU). At the end of the 10th epoch our validation loss is:

Fold 0; Epoch 10 eval: [dice_loss=0.59513, focal=0.02761, tot_loss=0.14331]

4. Running inference

To run inference with our trained model, we must first ensure that testing imagery is rescaled in the same manner as the training images. Once the test data is downloaded, run the create_8bit_images.py script:

python /path/to/cresi/cresi/data_prep/create_8bit_images.py \
  -indir=/path/to/data/SN5_roads/AOI_8_Mumbai/PS-MS \
  -outdir=/path/to/data/cresi_data/8bit/public_test/PS-RGB
  -rescale_type=perc \
  -percentiles=2,98 \
  -band_order=5,3,2

Now set the appropriate test paths in the JSON file:

"test_data_refined_dir": "/path/to/data/cresi_data/8bit/public_test/PS-RGB",
"test_results_dir": "sn5_baseline",

Finally, let’s execute the inference script. For the 825 images in our test set, this takes about 10 minutes on our single GPU.

python 02_eval.py jsons/sn5_baseline.json  
  model sucessfully loaded
  26%|############                | 27/104 [02:48<07:29,  5.84s/it]

Now let’s inspect the outputs. Figures 3–5 illustrate that even though the loss still has not converged after 10 hours of training, the model is nevertheless able to recognize the road structure and speed limits quite well.

Figure 3. Inference example over Moscow (SN5_roads_test_public_AOI_7_Moscow_PS-RGB_chip154.tif). Red = 11–20 mph, green = 21–30 mph, blue = 31–40 mph.

Figure 4. Inference example over Mumbai (SN5_roads_test_public_AOI_8_Mumbai_PS-RGB_chip115.tif). Red = 21–30 mph, green = 31–40 mph, blue = 41–50 mph.

Figure 5. Inference example over San Juan (SN5_roads_test_public_AOI_9_San_Juan_PS-RGB_chip405.tif). Red = 21–30 mph, green = 31–40 mph, blue = 41–50 mph.

5. Conclusions

In this post we described how to train a deep learning segmentation model to extract roads and speed limit estimates from SpaceNet data. After setting up our CRESI environment and assigning the appropriate data paths in the file, we trained a model for 10 hours on modest hardware (a single Titan X GPU). While the loss function was still converging after 10 hours, we nevertheless observed that the model does a very respectable job of identifying road structure and differentiating speed limits for various road types. Stay tuned for the final installment of this series that details the steps to turn these segmentation masks into actual road networks and create a submission for the SpaceNet 5 Challenge.