Robustness of Limited Training Data: Part 4

Daniel Hogan
The DownLinQ
Published in
7 min readSep 25, 2019

When it comes to the relationship between geospatial neural network performance and the amount of training data, do geographic differences matter? In a previous post, we examined this question by training the same building footprint model using various amounts of data from four different cities: Las Vegas, Paris, Shanghai, and Khartoum. That led to a plot (Figure 1) of performance for each city, either using a model trained on the city in question or using a model trained on the combined data of all four cities. In this post, we’ll take a closer look at two questions that went unanswered in the previous post. First, what happens if we take a model trained on one city and apply it to a different one it’s never seen before? And second, why does just one of the cities — Khartoum, Sudan — respond to a training data deficit more resiliently than the others?

Figure 1: Average F1 score versus number of training images per city. “Individual model” denotes models trained only on the city for which their F1 score is shown, while “combined model” denotes models trained on all four cities (i.e., with four times as much training data). Reprinted from here.


To assess model transferability, models trained on one city are tested on the others. The training is repeated four times for each training city, using 759 randomly selected tiles each time. Figure 2 shows the resulting performance, as measured by average F1 score.

Figure 2: Mean F1 score for models trained on one city (horizontal axis) and tested on another (vertical axis). Training is done with 759 tiles that are 200m on a side.

Not surprisingly, the best results are achieved when the model is tested on the same city it’s been trained on. Looking at the off-diagonal terms, none exceeds those on the diagonal. Beyond that, the matrix is strongly asymmetric in places. For example, a model trained on Khartoum and tested on Vegas does far better (F1=.44) than one trained on Vegas and tested on Khartoum (F1=.07). This illustrates that transferability is not commutative.

We can also use Figure 2 as a way of understanding which cities are most similar in the appearance of their imagery, at least as concerns traits relevant to building footprint identification. For each pair of cities we assign a similarity score as defined in Figure 3.

Then, we can make a “map” by plotting the network of cities in such a way that the more dissimilar cities are pushed further apart, as shown in Figure 4.

Daniel Hogan
The DownLinQ

Daniel Hogan, PhD, is a data scientist at CosmiQ Works, an IQT Lab.