Super-Resolution and Object Detection: A Love Story - Part 1
Jake Shermeyer & Adam Van Etten
The interplay between super-resolution techniques and object detection frameworks remains largely unexplored, particularly in the context of satellite or overhead imagery. Intuitively, super-resolution methods should increase object detection performance, as an increase in resolution should add more distinguishable features that an object detection algorithm can use for discrimination. Several foundational research questions have yet to be answered:
- Does the application of a super-resolution (SR) technique affect the ability to detect small objects in satellite imagery?
- Across what resolutions are these SR techniques effective?
- What is an ideal or minimum viable resolution for object detection?
- Can one artificially double or even quadruple the native resolution of coarser imagery to make the data more useful and increase the ability to detect fine objects?
Although some of our blogs scratch at these questions [1, 2] and several studies [1,2,3,4,5,6] have been conducted using SR as a pre-processing step, none have quantified the affect of SR on object detection performance in satellite imagery across multiple resolutions. Our work over the past months aims to accomplish that task by using SIMRDWN to train multiple custom object detection models to identify vehicles, boats, and planes in both native and super-resolved data. We then test the models performance on the native (ground-truth) imagery and super-resolved imagery of the same Ground Sample Distance (GSD). Additionally, this is the first research that we know of to demonstrate the output of super-resolved 15cm GSD satellite imagery. Although no native 15cm imagery exists from space for comparison, this data can be compared against coarser resolutions to test the benefits. We chose the xView Dataset to apply and test our super-resolution and object detection methods.
Super-resolution is ultimately conducted with two techniques(source code featured below). The first is a convolutional neural network technique called Very Deep Super-Resolution (VDSR). VDSR has been featured as a baseline for the majority of recent super-resolution research and exhibits near state-of-the art performance. The second is an approach called Random-Forest Super-Resolution (RFSR) that was custom designed for this work. It requires minimal training time and exhibits high inference speeds (Table 1). Its most similar contemporary in recent literature is Super-Resolution Forests (SRF). RFSR works by evaluating the lower-resolution pixels that neighbor a higher-resolution target pixel, learning the relationships between them, and then augmenting the low-resolution image to derive a new high-resolution output. We chose to include this simpler, less computationally intensive algorithm that does not require GPUs to test its effectiveness against a near state of the art SR solution. The hypothesis is that even a simple technique may have a benefit for object detection performance. Additionally, this technique has the advantage of being able to inference on images of arbitrary size, much larger than the GPU limits of around 4000 x 4000 pixels. Ultimately, we generate enhancement levels of 2x, 4x, and 8x over five distinct resolutions ranging from 30cm to 4.8m.
Finally, in this post we are pleased to announce the release of two fully pythonic codebases for super-resolution:
Look out for next blog in this series, where we showcase more results of the super-resolution work and an overview of its performance across our range of resolutions. Special thanks to Adam Van Etten, Dave Lindenbaum, Ryan Lewis, & Nick Weir for their contributions.
Please check out Part 2 as well!