The Satellite Utility Manifold; Object Detection Accuracy as a Function of Image Resolution

Adam Van Etten
The DownLinQ
Published in
7 min readApr 28, 2017

--

The expansion of commercial satellite imagery offers great promise, albeit with an implicit compromise between space and time. Established providers such as DigitalGlobe currently provide exquisite tasked imagery at 0.30–0.50 meter resolution from five satellites. The Space 3.0 company Planet promises daily revisits of 3–5 meter imagery using over 100 satellites, with 1m resolution in certain areas due to their recent acquisition of Google’s Terra Bella constellation. New constellations such as BlackSky promise to provide tasked 1m imagery with revisit rates as high as 40–70 times a day. In essence, these three paradigms occupy vastly different positions in [resolution, revisit, cost] space, which we shall call the satellite utility manifold.

In this post we seek to infer the shape of the manifold along the resolution axis via an approach similar to tomography. We utilize the Cars Overhead with Context (COWC) dataset, which is a large high quality set of annotated cars from overhead imagery. In a previous car localization post we detailed object detection accuracy on this dataset with the YOLT2 framework at 0.30m resolution. In the sections below we quantify the effects of resolution on object detection, with the aim of providing a cross-section of the manifold and informing tradeoffs in satellite design. For our particular dataset we show that objects need only be ~5 pixels in size to be localized with high confidence.

1. The Satellite Utility Manifold

The high revisit rates proposed by Space 3.0 constellations contribute value for many classes of problems, provided resolution remains high enough for the desired task. Many analytical methods still depend on the ability to detect and localize objects of interest, so if objects of interest cannot be detected reliably then no revisit rate can salvage the loss of spatial fidelity. Yet there may exist a sweet spot in the satellite utility manifold where analytics are maximized for a given cost.

The utility and cost of a given constellation design will depend on many factors, though resolution (both spatial and temporal) is a primary driver. For this blog we adopt object detection performance as our measure of utility, though acknowledge that there are multiple possible measures (object detection performance, segmentation accuracy, change detection fidelity, crop cover recall, etc). In the plots below we illustrate the expected morphology of the satellite utility manifold.

Figure 1. Notional utility manifold as a function of resolution and revisit rate. Both utility and cost increase with revisit rate as well as sharper resolution.

For a given satellite design one can estimate a cross-section of the manifold by downsampling existing data. For instance, data from a constellation with a revisit rate of six per day can be used to simulate revisit rates of three or two per day by by only retaining every second or third image, respectively. Similarly, a high resolution dataset can be extrapolated to lower resolutions by convolution with a gaussian kernel, thereby simulating imagery from smaller imaging apertures. Repeating this process for different hardware designs will allow one to gradually deduce the surface of the satellite utility manifold.

Figure 2. Cross-sectional slices of the utility manifold along the resolution (A) and revisit (B) axes.

In this study we utilize a high resolution dataset and attempt to infer the shape of curve 2A: utility as a function of resolution.

2. Dataset

As detailed in the car localization post and Mundhenk et al, 2016, the COWC dataset is a very high resolution imagery corpus of over 30,000 unique cars in six different locales. Imagery is at a nadir view angle and at 15 cm resolution. We train on five regions and reserve the largest geographic region of Utah for testing, containing over 20,000 labelled cars.

Figure 3 (from the car localization post). Sample COWC image over Potsdam at native 15 cm ground sample distance (GSD) with labels overlaid. Original labels are shown by a red dot located at each car centroid, while inferred 3 meter YOLT2 bounding box labels are shown in blue. Note that large trucks and other vehicles are not labelled, only cars. Imagery courtesy of ISPRS and Mundhenk et al, 2016.

2. Dataset Augmentation

COWC imagery has a resolution of 15 cm ground sample distance (GSD). To study the effects of resolution, we convolve the raw imagery with a Gaussian kernel and reduce the image dimensions to create additional training and testing corpora at [0.30, 0.45, 0.60, 0.75, 0.90, 1.05, 1.20, 1.50, 1.80, 2.10, 2.40, 3.00] meters. COWC labels consist of points centered on each car, from which we create bounding box labels by assuming a mean car size of 3.0 meters. As detailed in the car localization post, Section 3, we test on 23 images over Utah, and a total of 25,980 cars.

Figure 4. COWC training data over Potsdam convolved and resized to various resolutions from the original 0.15m resolution (top left); bounding box labels are plotted in blue. Imagery courtesy of ISPRS and Mundhenk et al, 2016.

3. Object Detection Models

For object detection we apply the YOLT2 convolutional neural network object detection pipeline (1, 2, 3, 4). This pipeline trains on bounding box labels for objects of interest, and runs rapidly on arbitrarily large satellite images. One of the strengths of the YOLT2 detection pipeline is speed; image inference proceeds at 44 frames per second, which translates to less than one minute for images as large as 200 megapixels. We train a separate model for each resolution (0.15, 0.30, 0.45, 0.60, 0.75, 0.90, 1.05, 1.20, 1.50, 1.80, 2.10, 2.40, 3.00 meters), for thirteen models total. Creating a high quality labeled dataset at low resolution (2.4m GSD, for example) is only possible because we downsample from already labeled high resolution 0.15m data; typically low resolution data is very difficult to label with high accuracy.

4. Object Detection Performance

The test procedure is outlined in the car localization post, Section 4. In brief, a true positive is defined as having a Jaccard index (also known as intersection over union) of greater than 0.25. The true and false positives and negatives are aggregated into a single value known as the F1 score, which varies from 0 to 1 and is the harmonic mean of precision and recall. We also compute the predicted number of cars in the scene as a fraction of the number of ground truth cars. For plotting purposes we adopt a color scheme of: blue = ground truth, green = true positive, red = false positive, yellow = false negative.

Figure 5. Object detection results on different resolutions on the same 800 x 800 pixel Salt Lake City cutout. The cutout on the left is at 0.15m GSD, with an F1 score for the entire 4000 x 4000 pixel scene of 0.94. The cutout on the right is at 0.90m GSD, with an F1 score for the entire scene of 0.84. Imagery courtesy of AGRC and Mundhenk et al, 2016.

In Figure 6 (below), we show a large urban test scene evaluated at increasing ground sample distance. At the end of the post we include attach higher resolution images for the interested reader.

Figure 6. Object detection results on different resolutions (top: 0.3m, middle: 1.2m, bottom: 3.0m) on a large 4000 x 4000 pixel urban test image. Note that the false negative rate rises dramatically with GSD. Imagery courtesy of AGRC and Mundhenk et al, 2016. Higher resolution images are included at the bottom of the page.

The plots above demonstrate the degradation of performance with increasing GSD. In the plots below we display performance as a function image resolution. We also display the object pixel size, defined as the quotient of the object size (~3 meters) and the GSD. A separate classifier is trained and evaluated at each of the thirteen resolutions.

Figure 7. Object detection F1 score for ground sample distances of 0.15–3.0 meters (bottom axis), corresponding to car size of 20–1 pixel(s) (top axis). At each of the thirteen resolutions we evaluate test scenes with a unique model trained at that resolution. The 23 thin lines display the performance of an individual test scene; most of these lines are tightly clustered about the mean, denoted by the blue dashed line. The red band displays +/- one standard deviation. We fit a piecewise linear model to the data, shown as the dotted cyan line. Below the inflection point (large cyan dot) of 0.61 meters the F1 score degrades slowly with a slope of dF1/dGSD = -0.10; between 0.6m and 3m GSD the slope is steeper at -0.26. The F1 scores at 0.15m, 0.60m, and 3.0m GSD are 0.92, 0.87, and 0.27, respectively.
Figure 8. Fraction of predicted number of cars to ground truth, with a unique model for each resolution (bottom axis) and object pixel size (top axis). A fraction of 1.0 means that the correct number of cars was predicted, while if the fraction is below 1.0 too few cars were predicted. The thin bands denote the performance of the 23 individual scenes, with the dashed blue line showing the weighted mean and the red band displaying +/- one standard deviation. We fit a piecewise linear model to the data, shown as the dotted cyan line. Below the inflection point (large cyan dot) of 0.86 meters the slope is essentially flat with a slope of -0.03; between 0.87m and 3m GSD the slope is steeper at -0.20. For resolutions sharper than 0.86 meters the predicted number of cars is within 4% of ground truth.

Object detection F1 score and enumeration fraction are two possible measures of satellite utility, and as such Figures 7 and 8 represent possible slices in the utility manifold of Figure 1.

We can also test the robustness of a single model applied across resolutions, as we demonstrate below.

Figure 9. Performance of the 0.3m model applied to various resolutions. The model peaks at F1 = 0.9 for the trained resolution of 0.3m, and rapidly degrades when evaluated with lower resolution data; it also degrades somewhat for higher resolution 0.15m data. Note that the plot is truncated at 0.9m, whereas Figures 7 and 8 extend all the way to 3.0m.

The curves of Figure 9 degrade far faster than Figures 7 and 8, illustrating that a single model fit at high resolution performs far worse than a series of models trained at each respective resolution.

5. Conclusions

In this post we introduce the concept of the satellite utility manifold and illustrate how one might infer the manifold morphology via sampling of cross-sections. Assigning utility as object detection F1 score, we compute the utility cross-section along the spatial axis using the COWC dataset and YOLT2 object detection framework. For objects ~3 meters in size we observe from Figure 7 that object detection performance degrades from F1=0.92 for objects 20 pixels in size to F1=0.27 for objects 1 pixel in size, with a mean error of 0.09. Interestingly, the F1 score only degrades by only 5% as objects shrink from 20 to 5 pixels in size (0.15m to 0.60m GSD). At least for cars viewed from overhead, one can conclude that object sizes of 5 pixels or greater yield object detection scores of F1 > 0.85. Note that this performance is far better than the heading classification performance noted in Post 3, Figure 5, implying that object detection may be less sensitive to resolution than heading classification.

This is an exciting time for satellite imagery analytics, particularly given the ever-increasing number of satellites on orbit. The precise utility of various constellation designs remains a somewhat open question, however, given the complex relationship between spatial resolution, temporal resolution, cost, and capability. Quantifying the utility of various constellation designs is a matter of great import to the satellite industry, and motivates much of our upcoming research. In future posts we will continue to explore and quantify these relationships.

Thanks to David Lindenbaum for constellation expertise, and to Ryan Lewis and lporter for useful comments.

May 29, 2018 Addendum: See this post for paper and code details.

Appendix A: Higher resolution versions of Figure 6.

Figure 6 at higher resolution. Object detection results on different resolutions (top: 0.3m, middle: 1.2m, bottom: 3.0m) on a large 4000 x 4000 pixel urban test image. Note that the false negative rate rises dramatically with GSD. Imagery courtesy of AGRC and Mundhenk et al, 2016.

--

--