Extracting Road Networks at Scale with SpaceNet
The fifth SpaceNet Challenge will launch in just a few weeks, focused on road networks and optimized routing via travel time estimates. In preparation for SpaceNet 5, this post discusses how one might build upon the results from open challenges, such as the first SpaceNet roads challenge (SpaceNet 3).
Specifically, we summarize our arXiv paper from a few months ago (April, 2019) that aims to extract road networks at scale, an approach we call City-scale Road Extraction from Satellite Imagery (CRESI). Road network extraction at scale is of high interest currently (e.g. [1]), and applicable to a number of fundamental societal challenges.
1. Narrow-Field Baseline Algorithm
As a first step, we train a model on the ~400 x 400 meter SpaceNet image chips. We utilize the hand-labelled road centerline GeoJSONs to build a road mask for input into a deep learning model, see Figure 1.
We train an ensemble of four segmentation models inspired by the winning SpaceNet 3 algorithm submitted by albu, and use a ResNet34 encoder with a U-Net inspired decoder. We include skip connections every layer of the network, with an Adam optimizer and a custom loss function of:
where BCE is binary cross entropy, and Dice is the Dice coefficient.
We also attempt to close small gaps and remove spurious connections not already corrected via removing unconnected subgraphs, cleaning out hanging edges, and connecting terminal vertices near non-connected nodes. The final narrow-field baseline algorithm consists of the steps detailed in Table 1, and illustrated in Figure 2.
2. Comparison With OSM
OpenStreetMap (OSM) is a great crowd-sourced resource curated by a community of volunteers, and consists primarily of hand-drawn road labels. Though OSM is a great resource, it is incomplete in many areas (see Figure 3).
As a means of comparison between OSM and SpaceNet labels, we use our baseline algorithm to train two models on SpaceNet imagery. One model uses ground truth masks rendered from OSM labels, while the other model uses the exact same algorithm, but uses ground truth segmentation masks rendered from SpaceNet labels.
Table II displays APLS scores computed over a subset of the SpaceNet test chips, and demonstrates that the model trained and tested on SpaceNet labels is far superior to other combinations, with a ≈ 60 − 90% improvement. Recall that APLS penalizes missed connections, spurious roads, and offset predictions. In this case, the model trained on SpaceNet data and tested on OSM data struggles since spurious roads are predicted, and some predicted roads are offset from the ground truth. The poor score for the model trained and tested on OSM is due in part to the more uniform labeling schema and validation procedures adopted by the SpaceNet labeling team compared to OSM, and in part due to offset labels.
3. Scaling to Large Images
The process detailed in Section 1 works well for small input images below ∼ 2000 pixels in extent, yet fails for images larger than this due to a saturation of GPU memory. For example, even for a relatively simple architecture such as U-Net, typical GPU hardware (NVIDIA Titan X GPU with 12 GB memory) will saturate for images > 2000 pixels in extent and reasonable batch sizes > 4. In this section we describe a straightforward methodology for scaling up the algorithm to larger images. We call this approach City- scale Road Extraction from Satellite Imagery (CRESI). The first step in this methodology provided by the Broad Area Satellite Imagery Semantic Segmentation (BASISS) methodology; this approach is outlined in Figure 6, and returns a road pixel mask for a large test image.
4. Results
We apply the CRESI algorithm to large test areas extract from all four SpaceNet 3 cities. Solutions from the SpaceNet 3 challenge maxed out at an APLS score of 0.67. Testing over the four cities with CRESI yields an APLS score of 0.69 ± 0.02.
Since the algorithm output is a NetworkX graph structure, myriad graph algorithms can be easily applied. In addition, since we retain geographic information throughout the graph creation process, we can overlay the graph nodes and edges on the original GeoTIFF that we input into our model. Figures 6 and 7 display portions of Las Vegas and Paris, respectively, overlaid with the inferred road network. Figure 7 demonstrates that road network extraction is possible even for atypical lighting conditions and off-nadir observation angles, and also that CRESI lends itself to optimal routing in complex road systems.
5. Inference Speed
Inference code has not been optimized for speed, but even so inference runs at a rate of 160 km2 (approximately the area of Washington D.C.) per hour on a single GPU machine. On a four GPU cluster the speed is a minimum of 370 km2/ hour.
6. Conclusions
Optimized routing is crucial to a number of challenges, from humanitarian to military. Satellite imagery may aid greatly in determining efficient routes, particularly in cases of natural disasters or other dynamic events where the high revisit rate of satellites may be able to provide updates far faster than terrestrial methods.
In this blog we summarized methods detailed in our arXiv paper to extract city-scale road networks directly from remote sensing imagery. We demonstrated methods to infer road networks for input images of arbitrary size, which can subsequently be used for a multitude of purposes in resource starved or dynamic environments.
Stay tuned for more updates on road network extraction in the lead-up to the September 2019 launch of SpaceNet 5.