Road Network and Travel Time Extraction from Multiple Look Angles with SpaceNet Data

Adam Van Etten
The DownLinQ
Published in
6 min readMay 21, 2020


Road network discovery from remote sensing data remains a challenging task, despite recent open source advancements (e.g. SN5 models). Even more challenging is extracting road network features (such as travel time) from non-ideal data. In this post we summarize a recent research paper written by the CosmiQ team (Adam Van Etten, Jake Shermeyer, Daniel Hogan, Nick Weir, Ryan Lewis), which has been accepted to the 2020 IEEE International Geoscience and Remote Sensing Symposium. We demonstrate that road networks can be accurately extracted from off-nadir satellite imagery (up to a certain point), and that road travel time estimates can be identified with very high precision.

1. Introduction

Using convolutional neural networks to interpret overhead imagery has applications in disaster response [1, 2], agriculture [3], and many other domains [4]. Remote sensing satellites with high spatial resolution often have to point their sensors off-nadir to capture areas of interest if they are not directly overhead. This is particularly common in real-world use cases when timely collection and analysis is required, necessitating collection from an oblique (off-nadir) look angle. This motivates an analysis of how viewing angle affects deep learning model performance. The effect of viewing angle on model performance for finding building footprints has been previously studied [5], but here an analogous study for roads is undertaken for the first time. The ability to construct a road network from a satellite image is representative of a broad class of geospatial deep learning problems, while also being intrinsically valuable for routing during rapidly-changing conditions. Key to this capability is going beyond pixel segmentation — to extracting and evaluating a graph-theoretic representation of a road network.

2. Dataset

In brief (see, or [5] for more detail), the SpaceNet Multi-View Overhead Imagery (MVOI) dataset comprises 27 distinct collects acquired during a single pass of the Maxar WorldView-2 satellite over Atlanta. These looks range from 7 to 53 degrees off-nadir, all covering the same 665 square kilometers of geographic area.

MVOI includes manually curated labels for machine learning: >120,000 building footprint polygons and ≈3, 000 km of road network centerlines. The road network labels contain metadata indicating number of lanes, road type (residential surface road, major highway, etc.) and surface type. These attributes dictate estimated safe travel speed [6]. The images and labels are tiled into 450m × 450m (0.20 square kilometer) chips for machine learning, and multi-channel masks created for training a segmentation model.

Figure 1: Sample training data. (a) Example Atlanta training chip. (b) Multi-channel training mask with roads colored by speed (red = 25 mph, green = 45 mph, blue = 55 mph).

3. Algorithmic Approach

We utilize the open source City-scale Road Extraction from Satellite Imagery v2 (CRESIv2) algorithm [6] that served as the baseline for the recent SpaceNet 5 Challenge focused on road networks and optimized routing from satellite imagery. Figure 2 illustrates the process by which CRESIv2 rapidly extract roads from satellite imagery. Inference runs at ~300 square kilometers per hour using a single Titan X GPU.

Figure 2. CRESI algorithm. First, images are segmented via a deep learning model, the multi-class mask is then refined and skeletonized. This skeleton is then transformed into a road graph and the speed sheep bins are used to infer travel time for each road segment.

4. Experiments

We begin by training a model solely on the most nadir (7 degree) imagery, to mimic typical collects of remote sensing imagery. To explore off-nadir data, we train models by combining all data within a certain angle range. We use four nadir bins: ‘NADIR’ (≤ 25 degrees, ‘OFF’ (26 − 39 degrees), ‘VOFF’ (≥40 degrees), and ‘ALL’.

Scoring is accomplished via the graph-theoretic Average Path Length Similarity (APLS) metric [7]. This metric sums the differences in optimal path lengths between nodes in the ground truth graph G and the proposal graph G’ APLS scales from 0 (terrible) to 1 (perfect). The definition of shortest path can be user defined; we focus on the APLS_time metric to measure differences in travel times between ground truth and proposal graphs, but also consider geographic distance as the measure of path length (APLS_length ).

5. Results

Figure 3 displays the inferred speed for a selection of test chips, with successful differentiation of speed for different road types. Intriguingly, we observe near equal performance when weighting edges with length or travel time, with an APLS_length score only 0.03 higher than APLS_time. These results indicate that road speeds and travel times are extracted quite precisely, as any error in travel time would compound existing errors in the road network topology.

Figure 3: Speed inference results, ‘ALL’ model. Roads are colored by speed, from yellow (25 mph) to red (65 mph). APLS_time scores are displayed in the bottom left corner of each chip.

In Figure 4 we show predictions and ground truth for multiple nadir angles and test chips. Note that the algorithm frequently successfully connects roads even when over- hanging trees obscure the road. The model also has some limited success in connecting occluded roads behind buildings.

Figure 4: Results at different look angles, ‘ALL’ model. Each row is the same geographic chip, with each column a unique observation angle. Ground truth labels are colored in cyan, with model predictions in orange. APLS_time scores are displayed in the bottom right of each chip.

Figure 5 shows the aggregate APLS_time performance at each nadir angle for each of the five models. This figure renders south-facing nadir angles as negative, and illustrates that each model performs well in the angle bin it was trained in. Of particular import is that the model trained on all bins (i.e. ‘ALL’) equals performance (within errors) of the bin-specific (e.g. ‘VOFF’) models. Evidently, the ‘ALL’ model incorporating all training angles is far more robust than bin-specific models, with APLS_time > 0.5 for nadir angles between -32 and +36 degrees. In the very off-nadir bin of 40 degrees or greater we observe a marked drop in performance, with APLS_time ≈ 0.2 at the highest nadir angle of 53 degrees.

Figure 5: Results for each model. APLS_time scores for at each nadir angle for each of the five trained models. The x-axis is the value of the nadir angle (south facing angles are <0). We also plot the standard error of the mean for the combined model.

6. Conclusions

We utilize the heretofore unexplored SpaceNet MVOI road labels to train models for road network and travel time extraction at both on- and off-nadir imagery. For a model trained solely on a single nadir collect (taken at a mere 7 degrees off-nadir) we achieve reasonable APLS scores out to ≈25 degrees off-nadir, though at higher nadir angles performance with this model drops precipitously. We find that incorporating all available data regardless of inclination angle into one model is far more robust than bin-specific models trained on a subset of look angles. This global model achieves scores of APLS_time > 0.5 for nadir angles between -32 and +36 degrees, though road network inference at very high off-nadir angles of ≥ 45 degrees is extremely challenging.

Road network extraction performance at off-nadir angles has a somewhat different functional form than building extraction at off-nadir. Comparing Figure 5 to [5], we note that for buildings the bin-specific model outperforms the global model in the very off-nadir regime; yet for roads we find that a global model performs well at all angles. For roads, we observe a 65% drop in performance between nadir and 53 degrees off-nadir; contrast this to buildings, where published results indicate a 91% drop in score between nadir and 53 degrees off-nadir. It appears that inferring occluded roads in high off-nadir shots is easier than inferring building footprints. This may be due in part to the greater utility that context plays for roads; since roads are usually connected, surrounding roadways are able to inform occluded roads, while surrounding buildings yield less information about occluded buildings.

Surprisingly, the APLS_time and APLS_length scores are nearly identical across all look angles (6% average difference), despite the additional requirement to extract safe travel speed for estimating APLS_time. This indicates that CRESIv2 estimates safe travel speed with very high precision. As our estimate for safe travel speed is defined by road size, surface type, and context (e.g. residential road vs. major highway), this implies that the algorithm recognizes road attributes as well as geometries.

Automated extraction of road speeds and travel times from off-nadir satellite imagery applies to a great many problems in the humanitarian and disaster response domains; this post has demonstrated that such a task is not only possible, but available in the open source and far faster than manual annotation.