Super-Resolution and Object Detection: A Love Story- Part 2

Jake Shermeyer & Adam Van Etten

In our previous post, we unveiled an introductory outline for some of our work in exploring the relationships between super-resolution (SR) and object detection algorithms in satellite imagery. As previously stated, we ultimately generate enhancement levels of 2x, 4x, and 8x over five distinct resolutions ranging from 30cm to 4.8m. Additionally, we produce a unique enhanced product: 15cm GSD super-resolved imagery. This post showcases more results, both qualitative and quantitative.

60cm GSD Input (Top Left) — 30cm GSD Ground Truth (Top Right)
30cm GSD RFSR Output (Bottom Left) — 30cm GSD VDSR Output (Bottom Right)

We chose the xView Dataset for the application of super-resolution techniques and quantification of object detection performance. The imagery consists of 1,415 sq. km of DigitalGlobe WorldView-3 pan-sharpened RGB imagery at 30cm GSD resolution spread across 56 distinct global locations. The labeled dataset for object detection contains 1-million object instances across 60 classes annotated with bounding boxes, including various types of buildings, vehicles, planes, trains, and boats. For our purposes, we ultimately discarded larger objects such as buildings and aggregated the dataset into 5 specific transportation classes including small vehicles, large vehicles, small aircraft, large aircraft, and boats.

All data were preprocessed consistently to simulate coarser resolution imagery and test the effects of our SR techniques on a range of resolutions. We attempt to simulate coarser resolution satellite imagery as accurately as possible by simulating the point-spread function (PSF) and using a more robust (inter-area) decimation algorithm. We intend our results to showcase what is reasonably accomplishable given coarser satellite imagery; rather than simply study what is possible given the ideal settings (no blurring & use of bicubic decimation) under which most new SR algorithms are introduced.

Our data were degraded from the native 30cm GSD using a variable Gaussian blur to simulate the PSF of the satellite depending upon our desired output resolution. A base Gaussian sigma of 1 was chosen and then multiplied by the scale of degradation. The more an image is degraded, the larger a Gaussian blur is initially applied. The inter-area decimation was then chosen to degrade the imagery from 30cm to resolutions of 60cm, 1.2m, 2.4m, and 4.8m. Each resolution was then trained using these images for levels of 2, 4, and 8x enhancements.

How does the world look at different resolutions? 30cm, 60cm, 1.2m, 2.4m, and 4.8m GSD resolutions featured.

As a reminder, all of our source code can be downloaded here:

Very Deep Super-Resolution For Geospatial (VDSR4Geo)

Random Forest Super-Resolution (RFSR)

Finally, we present some preliminary results and scores in terms of PSNR and SSIM, which are standard performance metrics for evaluating super-resolved outputs vs. ground truth imagery (Table 1). The next post will feature the final piece of the puzzle: the official results and findings on the relationships between object detection performance and super-resolution outputs.

Table 1. Quantitative evaluation of super-resolution performance (PSNR/SSIM scores for the luma component) reported for the xView validation dataset (281 WV3 Images). Although these are strong scores, pulling out the finest of features is necessary for stronger object detection performance.
This image depicts the native and super-resolved output in a tabular format. As resolution degrades, super-resolution is less effective at recovering higher-resolution details.
VDSR (Left) and RFSR (Right) 30cm SR outputs. 2x (60cm to 30cm), 4x (120cm to 60cm), and 8x (240cm to 30cm) enhancements.
60cm GSD (Input)
30cm RFSR 2x (Output)
30cm VDSR 2x (Output)
30cm Ground Truth — Over the past 4 images, we can see that VDSR preserves more complex features versus RFSR at this construction site. More recently developed neural nets have shown even stronger performance in these structurally complex settings.

One of our primary findings from this work is that super-resolution is much more difficult in coarser resolution imagery. In these resolutions, mixed pixels become prevalent and small objects cannot be recovered. In our final blog(s) in this series, we will showcase the results of object detection performance of the super-resolved and native resolution imagery. Special thanks to Adam Van Etten, Dave Lindenbaum, Ryan Lewis, & Nick Weir for their contributions.