Extracting Road Networks at Scale with SpaceNet

Adam Van Etten
The DownLinQ
Published in
5 min readJul 26, 2019


The fifth SpaceNet Challenge will launch in just a few weeks, focused on road networks and optimized routing via travel time estimates. In preparation for SpaceNet 5, this post discusses how one might build upon the results from open challenges, such as the first SpaceNet roads challenge (SpaceNet 3).

Specifically, we summarize our arXiv paper from a few months ago (April, 2019) that aims to extract road networks at scale, an approach we call City-scale Road Extraction from Satellite Imagery (CRESI). Road network extraction at scale is of high interest currently (e.g. [1]), and applicable to a number of fundamental societal challenges.

1. Narrow-Field Baseline Algorithm

As a first step, we train a model on the ~400 x 400 meter SpaceNet image chips. We utilize the hand-labelled road centerline GeoJSONs to build a road mask for input into a deep learning model, see Figure 1.

Figure 1. SpaceNet Training Data. Left: SpaceNet GeoJSON label. Middle: Image with road network overlaid in orange. Right. Segmentation mask of road centerlines.

We train an ensemble of four segmentation models inspired by the winning SpaceNet 3 algorithm submitted by albu, and use a ResNet34 encoder with a U-Net inspired decoder. We include skip connections every layer of the network, with an Adam optimizer and a custom loss function of:

where BCE is binary cross entropy, and Dice is the Dice coefficient.

We also attempt to close small gaps and remove spurious connections not already corrected via removing unconnected subgraphs, cleaning out hanging edges, and connecting terminal vertices near non-connected nodes. The final narrow-field baseline algorithm consists of the steps detailed in Table 1, and illustrated in Figure 2.

Figure 2. Initial baseline algorithm. Left: Using road masks, we train a segmentation model to infer road masks from SpaceNet imagery. Left center: These outputs masks are then refined and smoothed. Right center: A skeleton is created from this refined mask. Right: Finally, this skeleton is subsequently rendered into a graph structure.

2. Comparison With OSM

OpenStreetMap (OSM) is a great crowd-sourced resource curated by a community of volunteers, and consists primarily of hand-drawn road labels. Though OSM is a great resource, it is incomplete in many areas (see Figure 3).

Figure 3. Potential issues with OSM data. Left: OSM roads (orange) overlaid on Khartoum imagery; the road traveling left to right across the image is missed. Right: OSM road labels (orange) and SpaceNet building footprints (yellow); in some cases road labels are misaligned and pass through buildings.

As a means of comparison between OSM and SpaceNet labels, we use our baseline algorithm to train two models on SpaceNet imagery. One model uses ground truth masks rendered from OSM labels, while the other model uses the exact same algorithm, but uses ground truth segmentation masks rendered from SpaceNet labels.

Table II displays APLS scores computed over a subset of the SpaceNet test chips, and demonstrates that the model trained and tested on SpaceNet labels is far superior to other combinations, with a ≈ 60 − 90% improvement. Recall that APLS penalizes missed connections, spurious roads, and offset predictions. In this case, the model trained on SpaceNet data and tested on OSM data struggles since spurious roads are predicted, and some predicted roads are offset from the ground truth. The poor score for the model trained and tested on OSM is due in part to the more uniform labeling schema and validation procedures adopted by the SpaceNet labeling team compared to OSM, and in part due to offset labels.

3. Scaling to Large Images

The process detailed in Section 1 works well for small input images below ∼ 2000 pixels in extent, yet fails for images larger than this due to a saturation of GPU memory. For example, even for a relatively simple architecture such as U-Net, typical GPU hardware (NVIDIA Titan X GPU with 12 GB memory) will saturate for images > 2000 pixels in extent and reasonable batch sizes > 4. In this section we describe a straightforward methodology for scaling up the algorithm to larger images. We call this approach City- scale Road Extraction from Satellite Imagery (CRESI). The first step in this methodology provided by the Broad Area Satellite Imagery Semantic Segmentation (BASISS) methodology; this approach is outlined in Figure 6, and returns a road pixel mask for a large test image.

Figure 4. Process of slicing a large satellite image (top) and ground truth road mask (bottom) into smaller cutouts for algorithm training or inference.

4. Results

We apply the CRESI algorithm to large test areas extract from all four SpaceNet 3 cities. Solutions from the SpaceNet 3 challenge maxed out at an APLS score of 0.67. Testing over the four cities with CRESI yields an APLS score of 0.69 ± 0.02.

Figure 5. Output of CRESI inference as applied to a test region in Shanghai.

Since the algorithm output is a NetworkX graph structure, myriad graph algorithms can be easily applied. In addition, since we retain geographic information throughout the graph creation process, we can overlay the graph nodes and edges on the original GeoTIFF that we input into our model. Figures 6 and 7 display portions of Las Vegas and Paris, respectively, overlaid with the inferred road network. Figure 7 demonstrates that road network extraction is possible even for atypical lighting conditions and off-nadir observation angles, and also that CRESI lends itself to optimal routing in complex road systems.

Figure 6. Output of CRESI inference as applied to one of the Las Vegas test regions with predicted roads overlaid in yellow. The APLS score for this prediction is 0.86. There are 1023.9 km of labeled SpaceNet roads in this region, and 1029.8 km of predicted roads.
Figure 7. Optimal route (red) computed between two nodes of interest on the graph output of CRESI for a subset of the Paris test region. This figure illustrates that road extraction and route planning is possible even for atypical (i.e. dark) lighting conditions.

5. Inference Speed

Inference code has not been optimized for speed, but even so inference runs at a rate of 160 km2 (approximately the area of Washington D.C.) per hour on a single GPU machine. On a four GPU cluster the speed is a minimum of 370 km2/ hour.

6. Conclusions

Optimized routing is crucial to a number of challenges, from humanitarian to military. Satellite imagery may aid greatly in determining efficient routes, particularly in cases of natural disasters or other dynamic events where the high revisit rate of satellites may be able to provide updates far faster than terrestrial methods.

In this blog we summarized methods detailed in our arXiv paper to extract city-scale road networks directly from remote sensing imagery. We demonstrated methods to infer road networks for input images of arbitrary size, which can subsequently be used for a multitude of purposes in resource starved or dynamic environments.

Stay tuned for more updates on road network extraction in the lead-up to the September 2019 launch of SpaceNet 5.