Optimal route selection from remote sensing imagery remains a significant challenge despite its importance in a broad array of applications. While identification of road pixels has been attempted before, estimation of route travel times from overhead imagery remains a novel problem. To this end, we build upon our previous post and explore road network extraction at scale with inference of semantic features of the graph, identifying speed limits and route travel times for each roadway. We call this approach City-Scale Road Extraction from Satellite Imagery v2 (CRESIv2). Including estimates for travel time permits true optimal routing, not just the shortest geographic distance.
This approach also serves as an initial baseline to the upcoming SpaceNet 5 challenge focused on road networks and optimized routing. See our arXiv paper for further details and related work, and our github repository for code.
1. Algorithm Overview
Our approach is to combine novel segmentation methods that encapsulate road speed with improved post-processing techniques for road vector simplification. We utilize satellite imagery and geocoded road centerline labels to build training datasets for the first step in our algorithm: segmentation.
We have two goals: extract the road network over large areas, and assess travel time along each roadway. In order to assess travel time we assign a speed limit to each roadway based on metadata tags such as road type, number of lanes, and surface construction. We begin by training a segmentation model. Two approaches are explored in the paper (both a multi-class model, as well as a continuous model), though in this post we only explore the multi-class model.
2. Multi-Class Segmentation
We create multi-channel training masks by binning the road labels into a 7-layer stack, with channel 0 detailing speeds between 1–10 mph, channel 1 between 11–20 mph, etc. (see Figure 1). We train a segmentation model in- spired by the winning SpaceNet 3 algorithm, and use a ResNet34 encoder with a U-Net inspired de- coder. We include skip connections every layer of the network, with an Adam optimizer and a custom loss function of:
where ‘focal’ is focal loss, ‘dice’ is the Dice coefficient, and α is a constant.
3. Graph Extraction Procedure
The output of the segmentation mask step detailed above is subsequently refined into road vectors. We begin by smoothing flattening, and filtering the final output mask to create a refined prediction mask. A skeleton and subsequently a road graph is created from this refined mask (see our previous post for further details).
4. Speed Estimation Procedure
We estimate travel time for a given road edge by leveraging the speed information encapsulated in the prediction mask. The majority of edges in the graph are composed of multiple segments (e.g. the edge in Figure 2 has 6 segments). Accordingly, we attempt to estimate the speed of each segment in order to determine the mean speed of the edge. The speed of the segment is estimated extracting a small 8 × 8 pixel patch at the segment midpoint. If the majority of the the high confidence pixels in the pre- diction mask patch belong to channel 3 (corresponding to 31–40 mph), we would assign the speed at that patch to be 35 mph. Travel time is then calculated as edge distance divided by mean speed.
5. Scaling to Large Images
The process detailed above works well for small input images, yet fails for large images due to a saturation of GPU memory. For example, even for a relatively simple architecture such as U-Net, typical GPU hardware (NVIDIA Titan X GPU with 12 GB memory) will saturate for images greater than ∼ 2000 × 2000 pixels in extent and reasonable batch sizes > 4. Accordingly we follow the BASISS and CRESIv1 approach to extract networks at scale.
5. APLS Metric
To measure the difference between ground truth and proposal graphs, the APLS metric sums the differences in optimal path lengths between nodes in the ground truth graph G and the proposal graph G’. The APLS metric scales from 0 (poor) to 1 (perfect). The definition of shortest path can be user defined; the natural first step is to consider geographic distance as the measure of path length (APLS_length), but any edge weights can be selected. Therefore, if we assign a travel time estimate to each graph edge we can use the APLS_time metric to measure differences in travel times between ground truth and proposal graphs.
6. SpaceNet 3 Test Corpus Results
We compute the APLS graph-theoretic metric performance for the 400 × 400 meter image chips in the SpaceNet 3 test corpus. An example result is displayed in Figure 3, and Table 2 shows computed scores. Reported errors (±1σ) reflect the relatively high variance of performance among the various test scenes in the four SpaceNet cities. Table 2 demonstrates that for the multi-class model the APLS score is still 0.58 when using travel time as the weight, which is only 13% lower than when weighting with geometric distance.
The APLS_length score of 0.68 is comparable to the winning SpaceNet 3 score of 0.67. This marginal gain in APLS_length is beneficial, though of course the primary focus of this work is road speed extraction. We also use only a single model (rather than the ensemble of albu) in the interest of inference speed.
7. Large Area SpaceNet Results
In addition to the SpaceNet 3 test corpus, we also test our large-scale algorithm on a large area test set covering over 600 square kilometers of the four SpaceNet 3 cities (Las Vegas, Paris, Shanghai, Khartoum). We achieve comparable results to Table 2, with APLS_length = 0.67 ± 0.04. APLS_time = 0.64 ± 0.03, which is only a 4% decrease in APLS score when using speed versus length as edge weights. This is somewhat improved from the decrease of 13% noted in Table 2, due primarily to the fewer edge effects from larger testing regions. Figures 4 and 5 display the graph output of the algorithm for various urban environments.
Satellite imagery may aid greatly in determining optimal routes in scenarios involving natural disasters or other dynamic events where the high revisit rate of satellites may be able to provide updates far more quickly than terrestrial methods. In this post we demonstrated methods to extract city-scale road networks with travel time estimates directly from remote sensing imagery.
Prediction is reasonably quick, with an inference speed of ≥ 280 km2 / hour on a single GPU. At this speed, a 4-GPU cluster could map the entire 9100 km2 area of Puerto Rico in ≈ 8 hours, a significant improvement over the two months required by human labelers (1) following hurricane Maria.
While automated road network extraction is by no means a solved problem, the CRESIv2 algorithm improves upon and extends existing methods, with potential benefits to applications such as disaster response where rapid map updates are critical to success. We look forward to even greater improvements in automated mapping from the SpaceNet 5 competition. Stay tuned for further details on large scale road inference, as well as an updated baseline with our Solaris library.