The DownLinQ
Published in

The DownLinQ

Broad Area Satellite Imagery Semantic Segmentation (BASISS)

Extracting Road Masks from Massive SpaceNet Images

Road network detection from overhead imagery is of great interest to a diverse set of disciplines, from humanitarian crises to autonomous vehicles. Road network mapping remains especially challenging in broad expanses of underdeveloped regions. In order to help address this challenge, the latest SpaceNet dataset includes over 8000 km of road centerline labels with co-registered imagery. In a previous post we used these centerlines to create rasterized road masks. In this post we explore methods to derive road segmentation masks from SpaceNet satellite imagery, and demonstrate techniques to mitigate deep learning hardware limitations in order to infer maps over large areas. Attendant code is provided for the interested reader.

1. Existing Approaches and Data

Extracting road pixels in small image chips from aerial imagery has a rich history (e.g. Mnih and Hinton 2010, Wang et al 2016, Zhang et al 2017). These algorithms generally use lower resolution imagery (GSD >= 1 meter), and OpenStreetMap labels. The large dataset size, higher resolution (0.3 meter GSD), and hand-labeled and quality controlled labels of SpaceNet provide a significant enhancement over current datasets and provide an opportunity for algorithm improvement. SpaceNet image chips are 1300 x 1300 pixels in size (~400m x ~400m).

2. Mask Extraction

In a previous post we demonstrated how to extract road training masks from SpaceNet imagery and labels (see Figure 1). The code to create masks is slightly tweaked from the previous post, allowing either 3-band RGB or 8-band multispectral imagery to be utilized, and collects lists of files to be used for training or testing. Masks can be created by running the following bash command:

cd $basiss_path
conda env create -f src/apls_environment.yml
source activate apls_environment
python $basiss_path/ \
--path_data=/path_to_spacenet_data/AOI_2_Vegas_Train \
$basiss_path/packaged_data/AOI_2_Train_2m_file_locs.csv \
--buffer_meters=2 \
--n_bands=3 \
--make_plots=1 \
Figure 1. Example output of the script. In the upper-left we show the raw GeoJSON label, and the upper-right displays the corresponding 8-bit RGB image. The lower-left illustrates the output of the script: the pixel mask inferred from the GeoJSON label. The lower-right shows the road mask overlaid on the RGB image.

3. Segmentation Model

We apply a deep learning segmentation algorithm to determine which SpaceNet pixels belong to roads, and which to background. We experiment with multiple network architectures, though note little difference in performance among the various models. Accordingly, in this blog we focus on one of the simplest recent architectures: U-Net. We cast the training masks created in Section 2 into a 2-layer stack consisting of source (road) and background.

def unet(input_shape, n_classes=2, kernel=3,     
loss='binary_crossentropy', data_format='channels_last'):
print ("UNET input shape:", input_shape)
inputs = Input(input_shape)
conv1 = Conv2D(32, (kernel, kernel), activation='relu',
conv1 = Conv2D(32, (kernel, kernel), activation='relu',
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (kernel, kernel), activation='relu', padding='same')(pool1)...

4. Image Slicing

Even for a relatively simple architecture such as U-Net, typical GPU hardware (NVIDIA Titan X GPU with 12 GB memory) will saturate for images > 1000 pixels in extent and reasonable batch sizes > 4. Therefore, in order to evaluate even the SpaceNet image chips (let alone the larger full DigitalGlobe images), we must subdivide the imagery and training masks into smaller chunks, as detailed in the code below.

def slice_ims(im_arr, mask_arr, names_arr, slice_x, slice_y, 
stride_x, stride_y,
pos_columns = ['idx', 'name', 'xmin',
'ymin', 'slice_x',
'slice_y', 'im_x', 'im_y'],
'''Slice images into patches, assume ground truth masks
are present'''

if verbose:
print ("Slicing images and masks...")
t0 = time.time()
mask_buffer = 0
count = 0
im_list, mask_list, name_list, pos_list = [], [], [], []
nims,h,w,nbands = im_arr.shape
for i, (im, mask, name) in enumerate(zip(im_arr, mask_arr,
seen_coords = set()
if verbose and (i % 100) == 0:
print (i, "im_shape:", im.shape,
"mask_shape:", mask.shape)

# dice it up
# after resize, iterate through image
# and bin it up appropriately
for x in range(0, w - 1, stride_x):
for y in range(0, h - 1, stride_y):

xmin = min(x, w-slice_x)
ymin = min(y, h - slice_y)
coords = (xmin, ymin)

# check if we've already seen these coords
if coords in seen_coords:

# check if we screwed up binning
if (xmin + slice_x > w) or (ymin + slice_y > h):
print ("Improperly binned image,")
# get satellite image cutout
im_cutout = im[ymin:ymin + slice_y,
xmin:xmin + slice_x]

# skip if the whole thing is black
if np.max(im_cutout) < 1.:
count += 1

# get mask cutout
x1, y1 = xmin + mask_buffer, ymin + mask_buffer
mask_cutout = mask[y1:y1 + slice_y, x1:x1 + slice_x]
# set slice name
name_full = str(i) + '_' + name + '_' \
+ str(xmin) + '_' + str(ymin) + '_' \
+ str(slice_x) + '_' + str(slice_y) \
+ '_' + str(w) + '_' + str(h)
pos = [i, name, xmin, ymin, slice_x, slice_y, w, h]
# add to arrays

# convert to np arrays
del im_arr
del mask_arr
name_out_arr = np.array(name_list)
im_out_arr = np.array(im_list)
mask_out_arr = np.array(mask_list)

# create position datataframe
df_pos = pd.DataFrame(pos_list, columns=pos_columns)
df_pos.index = np.arange(len(df_pos))

if verbose:
print (" im_out_arr.shape;", im_out_arr.shape)
print (" mask_out_arr.shape:", mask_out_arr.shape)
print (" mask_out_arr[0] == mask_out_arr[1]?:",
np.array_equal(mask_out_arr[0], mask_out_arr[1]))
print (" Time to slice arrays:", time.time() - t0,

return df_pos, name_out_arr, im_out_arr, mask_out_arr
Figure 2. Process of slicing a large satellite image (top) and ground truth road mask (bottom) into smaller cutouts for algorithm training or inference.

5. Training Procedure

Sliced mask cutouts (400 pixels in extent by default) form the training images for our segmentation algorithm. We train a separate classifier with Keras for each SpaceNet city (Las Vegas, Paris, Shanghai, Khartoum). We hold back 10% of the training set for validation purposes. We use binary cross entropy for our loss function, and utilize early stopping (we stop training when the validation loss has not improved for four consecutive steps). For the Las Vegas dataset, for example, training converges after 17 epochs, which takes 15 hours. Training can be accomplished via the command:

# train Las Vegas SpaceNet 3-band data with unet, 
# and sliced into 400 pixel cutouts
cd $basiss_path
nohup python -u src/ \
--path $basiss_path \
--model unet \
--mode train \
--file_list AOI_2_Train_2m_file_locs.csv \
--slice_x 400 --slice_y 400 \
--stride_x 300 --stride_y 300 \
--n_bands 3 \
--n_classes 2 \
--batchsize 32 \
--validation_split 0.1 \
--early_stopping_patience 4 \
--epochs 128 \
--gpu 0 \
--prefix $outname > \
results/$outname.log \
& tail -f results/$outname.log

6. Testing Procedure

Test images are sliced as in Section 4 above, and run through the trained model. The multiple cutouts for each image are then aggregated into a final image, as detailed in Figure 3.

Figure 3. Create road prediction map for image of arbitrary size. 1. Run an overlapping sliding window over the large input image. 2. Run image chip through the segmentation algorithm. 3. Stitch image chips into total map. 4. Divide this map by the coverage map (how many times each pixel appeared in a sliding window). 5. Final output is the normalized road mask.

Testing is invoked with the following command:

# test Las Vegas SpaceNet 3-band data with unet, 
# and sliced into 400 pixel cutouts
cd $basiss_path
nohup python -u src/ \
--path $basiss_path \
--model unet \
--mode test \
--file_list massive_file_list.csv \
--model_weights AOI_2_Vegas_unet_2m_train_model_best.hdf5 \
--slice_x 400 --slice_y 400 \
--stride_x 300 --stride_y 300 \
--n_bands 3 \
--n_classes 2 \
--batchsize 16 \
--gpu 3 \
--prefix $outname > \
results/$outname.log & \
tail -f results/$outname.log

7. Results

Figure 4. 400m chips over Vegas. Ground truth labels are shown at left, with the predicted mask on the right. Predictions for both of these image chips are quite good (pixel-based F1 > 0.8).
Figure 5. More challenging 400m chips in Khartoum (Top: F1 = 0.59. Bottom: F1 = 0.44). Dirt roads are particularly difficult, and dirt paths are often confused with roads.

Using the procedure detailed in Figure 3, we can also evaluate much larger images. In Figure 6 we illustrate road predictions on large images over 8000 pixels in extent. GPU inference time is ~15 seconds for images 2.5 km in extent.

Figure 6. 2.5 x 2.5 km image chips (left column) and road prediction mask (right column) over Las Vegas. F1 = 0.75 for both images.

8. Conclusions

The SpaceNet Road Detection and Routing Challenge aims to extract road network graphs directly from satellite imagery. In this post we explore one of the steps in the algorithmic chain, namely how to infer road masks from SpaceNet imagery. GPU memory limitations constrain segmentation algorithms to inspect images of size ~1000 pixels in extent, yet any eventual application of road inference must be able to process images larger than ~300m in extent. Accordingly, we demonstrate methods to infer road network masks for images of arbitrary size, and illustrate examples for images 2.5 km (>8000 pixels) in extent. Subsequent posts will explore how to turn these masks into graph structures. We encourage the interested reader to explore our github repository, as well as the SpaceNet challenge hosted on TopCoder.

* Thanks to lporter for helpful comments and David Lindenbaum for assistance with ground truth evaluation




As of March 2021, CosmiQ Works has been folded into IQT Labs. An archive will remain here to showcase historical work from CosmiQ Works that took place July 2016 — March 2021.

Recommended from Medium

Basic Convolutional Neural Network (CNN) Architecture

A step-by-step approach to building a machine learning model

Evaluate ML Classifier Performance using Statistical Hypothesis Testing in Python

Consider these 3 things when choosing your Machine Learning partner

Algorithms In Context #7: Decision Trees & Alpha-Beta Pruning

Build a chatbot using Rasa

MIT Deep Learning for Self-Driving Cars and the DeepTraffic Project

Start your first research project applying machine learning to real-world data in 2022

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Van Etten

Adam Van Etten

More from Medium

The Potent Mix of Machine Learning, Satellite Imagery, Vehicles, and Roads: Diet Hadrade

Building a coral segmentation model using sparse data

A timelapse of coral reefs turning white from dying.


Image Classification on Satellite Images with Deep Learning