Super-Resolution and Object Detection: A Love Story- Part 2
Jake Shermeyer & Adam Van Etten
In our previous post, we unveiled an introductory outline for some of our work in exploring the relationships between super-resolution (SR) and object detection algorithms in satellite imagery. As previously stated, we ultimately generate enhancement levels of 2x, 4x, and 8x over five distinct resolutions ranging from 30cm to 4.8m. Additionally, we produce a unique enhanced product: 15cm GSD super-resolved imagery. This post showcases more results, both qualitative and quantitative.
We chose the xView Dataset for the application of super-resolution techniques and quantification of object detection performance. The imagery consists of 1,415 sq. km of DigitalGlobe WorldView-3 pan-sharpened RGB imagery at 30cm GSD resolution spread across 56 distinct global locations. The labeled dataset for object detection contains 1-million object instances across 60 classes annotated with bounding boxes, including various types of buildings, vehicles, planes, trains, and boats. For our purposes, we ultimately discarded larger objects such as buildings and aggregated the dataset into 5 specific transportation classes including small vehicles, large vehicles, small aircraft, large aircraft, and boats.
All data were preprocessed consistently to simulate coarser resolution imagery and test the effects of our SR techniques on a range of resolutions. We attempt to simulate coarser resolution satellite imagery as accurately as possible by simulating the point-spread function (PSF) and using a more robust (inter-area) decimation algorithm. We intend our results to showcase what is reasonably accomplishable given coarser satellite imagery; rather than simply study what is possible given the ideal settings (no blurring & use of bicubic decimation) under which most new SR algorithms are introduced.
Our data were degraded from the native 30cm GSD using a variable Gaussian blur to simulate the PSF of the satellite depending upon our desired output resolution. A base Gaussian sigma of 1 was chosen and then multiplied by the scale of degradation. The more an image is degraded, the larger a Gaussian blur is initially applied. The inter-area decimation was then chosen to degrade the imagery from 30cm to resolutions of 60cm, 1.2m, 2.4m, and 4.8m. Each resolution was then trained using these images for levels of 2, 4, and 8x enhancements.
As a reminder, all of our source code can be downloaded here:
Finally, we present some preliminary results and scores in terms of PSNR and SSIM, which are standard performance metrics for evaluating super-resolved outputs vs. ground truth imagery (Table 1). The next post will feature the final piece of the puzzle: the official results and findings on the relationships between object detection performance and super-resolution outputs.
One of our primary findings from this work is that super-resolution is much more difficult in coarser resolution imagery. In these resolutions, mixed pixels become prevalent and small objects cannot be recovered. In our final blog(s) in this series, we will showcase the results of object detection performance of the super-resolved and native resolution imagery. Special thanks to Adam Van Etten, Dave Lindenbaum, Ryan Lewis, & Nick Weir for their contributions.