Super-Resolution on Satellite Imagery using Deep Learning, Part 3
In previous posts (Part 1 and Part 2), we trained a neural network to perform image enhancement using Peak Signal-to-Noise Ratio (PSNR) as a cost function and hypothesized that the enhancement will also improve object detection in satellite imagery. In this post, we show that our hypothesis appears to be correct.
As a general rule, when the resolution of imagery decreases, the accuracy of detection algorithms decreases, see here and here for more details. Our strategy is to pre-process imagery using our super-resolution technique to effectively improve resolution and subsequently improve accuracy of the detection algorithms. We quantify the effect of our super-resolution process on the performance of several detection algorithms that have been developed for the SpaceNet competition.
The initial release of SpaceNet has two types of imagery: 3-band (RGB) images at approximately 50 cm GSD and 8-band (MSI) at approximately 2 m GSD. We train two neural networks to enhance the 3-band imagery and the 8-band imagery separately. The neural networks are trained using PSNR as a cost function (actually the Mean Squared Error since the derivative is cheaper to compute) and consist of 5 perturbative layers with each perturbative layer having 3 convolutional layers. We scaled our imagery by a factor of 0.5 and by a factor of 0.25. (In other words, we increased the GSD by a factor of about 2 and 4 respectively.) We reduce the resolution of the image using bilinear interpolation to a smaller image and bilinear interpolation back to the original image size.
Good starting weights significantly affect the training process. The choice of 3 convolutional layers per perturbative layer increased the risk of being trapped in a local optimum (much more so than 2 convolutional layers per perturbative layer). Using several initial seeds, we are able to train a neural network with fewer layers and similar enhancement capability as measured by PSNR.
The spatial distribution of PSNR gains on similar imagery is presented in Images 5 and 6 of Part 2. Experimenting with better starting weights, hyper-parameters, and training strategies has resulted in increased PSNR gains.
It is difficult to translate the dB gain directly into the ability of a human to perceive different sizes and types of objects — the National Image Interpretability Rating Scale (NIIRS) lists features that are visually perceptible at different resolutions. One example is that cars are identifiable at 50 cm GSD and become difficult to identify at 2 m GSD.
The images above illustrate that the super-resolution process recovers details of the original imagery from a blurred image. To the untrained eye it is difficult to distinguish the enhanced image from the original at the 0.5 x scale. At the 0.25 x scale, the super-resolution process lacks the detail of the original imagery — the details resemble the 0.5 x scale more than the original imagery.
Source code for training CosmiQNet is available at github/CosmiQ/super-resolution.
The experiment is to test the hypothesis that PSNR is a relevant metric for image enhancement for the purposes of automated object detection. We perform this test using three entries from the SpaceNet competition (1st place wleite, 2nd place marek, and 4th place takahashi) and two neural networks from CosmiQ Works (CosmiQNet and YOLT2).
Details of the algorithms are available at the linked locations. The wleite entry is the winning entry in the competition and uses manual crafted features and random forests in the algorithm. The marek and takahashi entries are convolution neural networks with some post-processing. YOLT2 is a customized version of YOLO2 neural network discussed in this blog post. The CosmiQNet entry, discussed in this blog post, adapted the super-resolution network to perform object detection and requires significant post-processing to be competitive with the other algorithms in the comparison.
Each of the algorithms is tested against 5 datasets: the original imagery, imagery blurred by a factor of 0.5, imagery blurred by a factor of 0.5 and then enhanced using super-resolution, imagery blurred by a factor of 0.25, and imagery blurred by a factor of 0.25 and then enhanced using super-resolution.
The improvement in the F1 scores after enhancement supports the hypothesis that the PSNR was a relevant metric for training the super-resolution process. The enhancement recovered most of the accuracy loss incurred with a 0.5 x resolution blur, across five different object detection algorithms: 97%±4% of the original F1 score up from 68%±11% of the original F1 score. The recovery from the 0.25 x resolution blur was also significant:55%±16% of the original F1 score up from 24%±14% of the original F1 score.
To caveat these results, each of the algorithms is trained to perform building footprint detection on the original imagery at its original resolution. Some algorithms are more robust to changes in resolution. The F1 score is the harmonic mean of precision and recall. The improvement of F1 score by the super-resolution pre-processing is mainly a result of improved recall.
CosmiQNet has demonstrated a capability to recover details in blurred images and significantly improve object detection. The natural application is to increase the utility of lower cost sensors, especially satellite imagery sensors, using advanced analytic post-processing.
In this post, we have limited the experiments to enhance imagery based on synthetic blurring. We plan to explore this impact of super-resolution on other commercial sources of imagery.
Current training uses DigitalGlobe’s 50 cm GSD imagery from Woldview-2. We also plan to investigate training on aerial imagery to improve satellite imagery.
Keep following The DownlinQ for updates on these projects.