SpaceNet Road Detection and Routing Challenge — Part I

Adam Van Etten
The DownLinQ
Published in
7 min readOct 3, 2017


Determining optimal routing paths in near real-time is at the heart of many humanitarian, civil, military, and commercial challenges. In the humanitarian realm, for example, traveling to disaster stricken areas can be problematic for relief organizations, particularly if flooding has destroyed bridges or submerged thoroughfares. Autonomous vehicles is one of many examples on the commercial front, as self-driving cars rely heavily upon highly accurate road maps. Existing data collection methods such as manual road labeling or aggregation of mobile GPS tracks are currently insufficient to properly capture either underserved regions with infrequent data collection, or the dynamic changes inherent to road networks in rapidly changing environments. Looking ahead, we believe the frequent revisits of satellite imaging constellations may accelerate existing efforts to quickly update road network and routing information. Yet while satellites in theory provide an optimal method to rapidly obtain relevant updates, most existing computer vision research methods for extracting information from satellite imagery are neither fully automated, nor able to extract routing information from imaging pixels.

Accordingly, CosmiQ Works, in collaboration with SpaceNet partners DigitalGlobe, NVIDIA and hosted on AWS, plan to launch a third SpaceNet challenge focused on determining road networks and routing information directly from satellite imagery. In order to make significant advances in automated inference of routing information from satellite imagery, we recognize that metric selection for our competition is critical. As we will show below, the commonly used pixel-based F1 score is suboptimal for our task; this leads us to develop a new metric motivated by graph theory concepts that we believe will focus competitors on routing rather than just static road pixel identification. In the following post we describe our new evaluation metric in greater detail.

1. Current Labeling Practices

Current approaches to road labeling are often manually intensive. Classical computer vision techniques such as thresholding and edge detectors are often very helpful in identifying roads, though usually require manual inspection and validation. In the commercial realm, projects such as Bing Maps and Google Maps have been very successful in developing road networks from overhead imagery, though such processes are still labor intensive, and proprietary.

On the open source side, OpenStreetMap (OSM) is an extensive data set built and curated by a community of mappers. It primarily consists of streets, but contains some building and point of interest labels as well. The OSM community developed a robust schema for roads, including the type of road (e.g., residential, highway, etc), as well as other metadata (e.g., speed limit).

Figure 1. OSM data screenshot.

For many regions of the world, OSM road networks are remarkably complete, though in developing nations the networks are often poor (see Figure 2). Regardless of region, OSM labels are often outdated due to dynamic road networks, or poorly registered with overhead imagery (i.e., labels are offset from the coordinate system of the imagery).

Figure 2. Two regions of Khartoum demonstrating some of the issues of OSM labels when overlaid on SpaceNet data. Left: The east-west road through the middle of the image is not captured by the orange OSM road labels. Right: Poor registration often leads to roads that pass through buildings (yellow polygons denote SpaceNet building labels).

For disaster response the Humanitarian OpenStreetMap Team (HOT) often runs campaigns encouraging volunteers to label buildings and roads affected by natural disasters. For example, Project 104 from the OSM Task Manager aims to clean up and fix roads affected by Hurricane Harvey. Such efforts are laudable, yet obviously manually intensive and often not as timely as desired; automated processes may be able to markedly speed the completion of HOT campaigns.

2. SpaceNet Road Labels

The SpaceNet team is currently running an internal labeling campaign to produce high quality road centerlines for the four cities in SpaceNet Round 2: Las Vegas, Paris, Shanghai, and Khartoum. These labels should prove very beneficial to the goal of automating routing via satellite imagery, as they will be correctly registered to the existing SpaceNet imagery and undergo rigorous quality control.

3. Current Road Detection Techniques

Segmentation approaches (identifying which pixels in an image belong to which class of object) have shown some success in automatically locating roads from overhead imagery. Specifically, deep learning segmentation algorithms evaluated during a recent Kaggle competition demonstrated a respectable ability (Jaccard index ~ 0.6) to identify road pixels. Such an approach offers a promising first step in identifying routes, yet further work is required to use such segmentation masks in automated routing. Furthermore, scoring based on pixel masks does not always incentivize the desired behavior if routing is the end goal, as detailed in the following section.

4. F1 Metric

Pixel-level F1 score is a commonly used metric for computer vision segmentation evaluation. This metric weights each pixel equally, and a perfect score is only possible if all pixels are classified correctly as either road or background.

The F1 metric is suboptimal for routing purposes given that a slight error in road width is heavily penalized, though a brief break in an inferred road (from a shadow or an overhanging tree, for example) is lightly penalized, as illustrated in Figure 3.

Figure 3. Example of the pitfalls of the pixel-based F1 metric. Left: Ground truth road mask in green. Middle: proposal mask in orange, which achieves a decent F1 score of 0.82 since most pixels are correctly labeled despite some inconsistencies in road width. Right: proposal mask in cyan, which achieves a superior F1 score of 0.95 since fewer pixels are classified incorrectly. For routing purposes, however, the rightmost plot is clearly inferior since it misses an important intersection and severs a road. The fact that pixel-based F1 scores incentivize the rightmost plot over the middle plot is suboptimal.

5. APLS Metric

Our focus on road networks naturally leads us to consider graph theory as a means to evaluate proposals. While many proposals exist for graph similarity matching, these tend to focus on the logical topology (connections between nodes) of the graph, whereas we care about both logical topology as well as the physical topology of the roads. Since we are primarily interested in routing, we propose a graph theoretic metric based upon Dijkstra’s shortest path algorithm. In essence, we sum the differences in optimal paths between ground truth and proposal graphs. This is illustrated in Figure 4.

Figure 4. Demonstration of path length difference between sample ground truth and proposal graphs. Upper Left: Ground truth graph. Upper Right: Proposal graph with 30 edges removed. Lower Left: Shortest path between source (green) and target (red) node in the ground truth graph is shown in yellow, with a path length of ~948 meters. Lower Right: Shortest path between source and target node in the proposal graph, with a path length of ~1027 meters; this difference in length forms the basis for our graph similarity metric. Plotting is accomplished via the excellent osmnx package.

To measure the difference between ground truth and proposal graphs we propose the Average Path Length Similarity (APLS) metric that sums the differences in optimal path lengths between nodes in the ground truth graph G and the proposal graph G’. In effect, this metric repeats the path difference calculation shown in Figure 4 for all paths in the graph. Missing paths in the graph are assigned the maximum proportional difference of 1.0. The APLS metric scales from 0 (poor) to 1 (perfect).

Equation 1. APLS metric. Node a’ is the node in the proposal graph G’ nearest the location of node a in the ground truth graph G. L(a,b) denotes a path distance in the ground truth graph G, and L(a’, b’) denotes the path length between the corresponding nodes in the proposal graph.

Inherent to this metric is the notion of betweenness centrality, or the number of times a node appears along the shortest paths between other nodes. Missing nodes of high centrality will be penalized much more heavily by the APLS metric than missing nodes of low centrality. This feature is intentional, as heavily trafficked intersections are much more important than cul-de-sacs for routing purposes.

For small graphs the greedy approach of computing all possible paths in the network is entirely feasible (computing all possible paths for the graphs in Figure 4 takes less than 1 millisecond). For larger graphs one must decide which nodes and paths are of paramount interest, lest the computation load become burdensome. For the ground truth graph of Figure 4 there are 1720 possible paths, and we show the path length differences in Figure 5.

Figure 5. Left: Path length differences for all 1720 possible routes. Right: Path length difference histogram. Most routes are the same between the proposal and ground truth graphs, resulting in the many blue dots on the left plot and the large spike at path length 0.0 on the right plot. There are a few missing routes due to the disconnected node in the center of the proposal graph, and this results in the red dots on the left plot and the small spike at path length 1.0 on the right plot. Applying Equation 1 to this data yields APLS = 0.81.

Any proposed graph G’ with missing edges (e.g., if an overhanging tree is inferred to sever a road) will be heavily penalized by the APLS metric, so ensuring that roads are properly connected is crucial for a high score. There are additional subtleties to calculating the APLS metric for complex proposals (e.g., if proposal nodes are poorly aligned with ground truth nodes) which we will address in a subsequent post, but for the sake of brevity we refrain from addressing such complications here.

6. Satellite Imagery and the APLS Metric

The APLS metric is designed to reward correct node placement and connectivity, and so should prove a better metric than pixel-based F1 for automated route inference evaluation. As APLS is a graph-theoretic metric, a proposal graph must be constructed from the proposal mask output by segmentation algorithms. For the sake of brevity we leave the details of this procedure for a subsequent post. Figure 6 displays a proposal graph inferred from a deep learning segmentation mask, and illustrates the heavy penalty paid by the APLS metric for severed roads and missed intersections.

Figure 6. Comparison of F1 and APLS scores for a given proposal. Left: Sample satellite image with ground truth mask overlaid in orange. Middle: Proposal mask in yellow, yielding an F1 score of 0.72. Right: Proposal graph (red) inferred from the proposal mask; missing intersections and road segments are heavily penalized, yielding an APLS score of 0.25.

7. Conclusions

Optimized routing is crucial to a number of challenges, from humanitarian to military. Satellite imagery may aid greatly in determining efficient routes, particularly in cases of natural disasters or other dynamic events where the high revisit rate of satellites may be able to provide updates far faster than terrestrial methods.

Our upcoming SpaceNet Challenge will strive to explore the fidelity with which road networks can be inferred directly from satellite imagery. For this challenge, a large, high-quality dataset of road labels and attendant images will be released over multiple cities. In this post we demonstrated that the standard technique for scoring road detection algorithms (pixel-based F1 score) is suboptimal for routing purposes. Accordingly, for the SpaceNet Road Detection and Routing Challenge we propose the graph-theoretic APLS metric based on shortest paths between nodes in ground truth and proposal networks. This metric rewards correct road centerline and intersection determination to a far greater extent than pixel-based F1. Subsequent posts will discuss in further detail the pending dataset release, subtleties of comparing road networks, and methods for inferring graph structures from pixel masks.

* We thank Ryan Lewis and lporter for useful comments, and David Lindenbaum for assistance in plotting SpaceNet labels.