Image for post
Image for post

SpaceNet 5 Results Deep Dive Part 1 — Geographic Diversity

Adam Van Etten
Dec 18, 2019 · 7 min read

Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint & road network detection). SpaceNet is run in collaboration with CosmiQ Works, Maxar Technologies, Intel AI, Amazon Web Services (AWS), Capella Space, Topcoder, and IEEE GRSS.

In previous posts [1, 2, 3, 4] we discussed at length both the utility and challenges of the focus of SpaceNet 5: extracting road networks with travel time estimates directly from satellite imagery. In this post we outline the approaches of the winners, and discuss one key feature of SpaceNet 5: geographic diversity. For the initial public baseline of SpaceNet 5, we scored contestants on regions in Moscow, Russia, Mumbai, India, and San Juan, Puerto Rico. In an attempt to encourage contestants to create algorithms that generalize to new locales, the final standings were scored on different regions of those three cities, as well as on a final “mystery” holdback city: Dar Es Salaam, Tanzania. In the following sections we explore the robustness of models to unseen geographies, and find that for the SpaceNet dataset, neighborhood-level differences have a greater impact on model performance than inter-city variations.

1. SpaceNet 5 Cities and Road Properties

For training purposes we utilize both the SpaceNet 3 and SpaceNet 5 data, as described here. These six cities provide 8,900 km of training data and over 90,000 labeled roadways, split out over 4,900 ~400 x 400 m image chips. Each hand-labeled roadway contains metadata features such as road type (highway, residential, etc), number of lanes, surface type (paved, unpaved), etc. We use these features to infer road travel speed via APLS functionality (this post provides an example of speed limit inference). See Figures 1,2, 3 for dataset details.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Figure 3. Per-city breakdown of SpaceNet roads training metadata.

SpaceNet 5 utilized two different testing corpora. The test set used for the public leaderboard on Topcoder contained three cities and a total of ~1000 km of hand-labeled roads. Final results were computed from a separate private test set with ~1,500 km of labeled roads; this test set was curated from distinct regions of the three initial test cities (Moscow, Mumbai, San Juan), plus the the “mystery” city (Dar Es Salaam) whose existance was known to competitors, though of course the location was not revealed until after the challenge completed.

2. SpaceNet 5 Algorithmic Approaches

All five of the top submissions followed the approach of CosmiQ’s baseline CRESI algorithm. CRESI casts the ground truth geojson labels into multi-channel masks with each layer corresponding to a unique speed range. The next step is to train a segmentation model using a ResNet34+Unet backbone (also utilized in albu’s winning SpaceNet 3 submission). We then refine and clean the mask, skeletonize the refined mask, extract a graph structure from the skeleton, infer roadway speeds and travel times for each graph edge, and perform some final post-processing to clean up spurious edges and complete missing connections.

The algorithms submitted to SpaceNet all used this same general approach, though with significant differences in the segmentation models used, and the parameters of post-processing. See Table 1 for details of the segmentation models used by competitors. A following post will dive deeper into the speed / performance trade-off between the various approaches.

3. SpaceNet 5 Performance

Let’s take a look at how competitors performed on the challenge. Figure 4 shows performance on the public test set of image chips that were distributed to competitors without attendant labels. Note that scores in San Juan were consistently higher than Moscow or Mumbai. Scores of APLS_time ~ 0.50 imply that while road networks certainly are not perfect, they are generally still routable, with an expected error in arrival time of ~50% (inspection of Figures 7, 8, 10, 11 may help elucidate this point).

Figure 5 displays the final results on the private test set that was not made available to competitors, and so results could not be optimized for this imagery.

Let’s dive into these results in a bit more depth.

3A. High variance in APLS scores

Even within cities, there is a large variance in score between image chips. This is evidenced by Figure 6, showing the histogram of APLS_time scores for the winning algorithm. The reasons for this variance are myriad, and will be explored in further detail in upcoming blogs. Certainly dirt roads and overhanging trees complicate road extraction, as illustrated by Figures 7, 8.

Image for post
Image for post
Figure 7. Predictions from XD_XD’s winning algorithm over the San Juan private test region, with ground truth in blue and predictions in yellow. Left: High scoring region, with APLS_time = 0.80. Right: Low scoring rural region outside San Juan with APLS_time = 0.21, due largely to confusion with dirt roads and breaks caused by overhanging trees.
Image for post
Image for post
Figure 8. Predictions from XD_XD’s winning algorithm over the Moscow private test region, with ground truth in blue and predictions in yellow. Left: Moderately scoring region, with APLS_time = 0.64. Right: Lower scoring image with APLS_time = 0.37; the long shadows, significant building tilt, and snow make such scenes quite difficult.

3B. Performance drops in private test, even within the same city

The drop in aggregate score between the public and private datasets surprised many contestants, and is illustrated in Figure 9.

What Figure 9 indicates is that even in cities with training data present, it is easy to overtrain a model for a specific test region. Overtraining certainly explains some of the reason why some of the leading competitors of the public leaderboard dropped out of the top 5 for the private test set (compare Tables 1 and 2 on our previous blog). When the trained models were tested on a different region in the exact same city the algorithms universally dropped in performance for all competitors and all cities (see Figure 10).

3C. Comparable scores on the “mystery” city

Across all 5 of the winning algorithms and the baseline model, we observe no significant decrease in performance on the “mystery” city of Dar Es Salaam compared to previously seen cities.

In Figure 12 we plot the APLS_time score on the private test set for the aggregate of the three training cities, along with the “mystery” city. We actually observe a slight increase in performance for Dar Es Salaam versus the training cities.

Another way to view the data contained in Figure 12 is to inspect the average drop in score for each testing city, as well as the mean of the three cities in both the public and private test set (Moscow, Mumbai, and San Juan). We compare these differences to the difference in score of Dar Es Salaam to the mean score on the public test (see Figure 13). Note that the unseen city drops by less than the 3-city mean.

Figure 12 and 13 illustrate that when averaging all scores across the hundreds of testing chips in each city for multiple competitors, we see no significant reduction in performance when applying trained models to an unseen locale. In fact we observe a (statistically insignificant) slight increase in performance.

Conclusions

In this post we dove into the variation of performance by geography of the best SpaceNet 5 algorithms. While all competitors used a similar approach to the baseline CRESI model (i.e. multi-class segmentation + skeletonization + graph extraction + post-processing + speed inference), there is significant variation in the segmentation models used and post-processing techniques. Nevertheless, we observe similar trends across all models: a large drop in score between public and private regions (especially San Juan), high intra-city variability, and comparable aggregate performance on the mystery city of Dar Es Salaam. So what does this all mean?

As intended, the SpaceNet 5 challenge structure did indeed produce generalizable models that can be applied to unseen geographies with reasonable performance.

We can also conclude that with the training set provided by SpaceNet (6 cities on 4 continents, ~9,000 km of roads, and >90,000 individual labeled road segments) intra-city variations in performance are larger than inter-city variations. This is evidenced by the fact that for all six tested algorithms, aggregate performance on the unseen city of Dar Es Salaam is consistent with performance on the training cities. The variation within each city is high, implying that neighborhood-level details are more important to road network extraction than broader city-scale specifics like: road widths, background color, lighting conditions, etc.

Stay tuned for upcoming posts where we explore the neighborhood-level features that are predictive of road network model performance, as well as a look at speed/performance tradeoffs.

— — — — — — — — — — — — — — — — — — — — — — — — — —

*01 March 2020 edit: updated Table 1 to state the correct number of models for ikibardin

The DownLinQ

Welcome to the official blog of CosmiQ Works, an IQT Lab…

Thanks to Jake Shermeyer and Nick Weir

Adam Van Etten

Written by

The DownLinQ

Welcome to the official blog of CosmiQ Works, an IQT Lab dedicated to exploring the rapid advances delivered by artificial intelligence and geospatial startups, industry, academia, and the open source community

Adam Van Etten

Written by

The DownLinQ

Welcome to the official blog of CosmiQ Works, an IQT Lab dedicated to exploring the rapid advances delivered by artificial intelligence and geospatial startups, industry, academia, and the open source community

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store