Super-Resolution on Satellite Imagery using Deep Learning, Part 1
Yann Lecun has compared supervised learning to the icing on a cake, and unsupervised learning as the cake, asserting that we know how to make the icing but do not know how to make the cake. In this post, we provide a “cake” recipe for training an unsupervised learning algorithm to enhance satellite imagery.
This research is motivated by the increase in access to lower cost satellite imagery in the emerging commercial space industry. In this emerging industry, there is a trade-off amongst sensor quality, revisit rate, and price. We investigate advanced image processing’s potential to reduce the trade-off and to improve the imagery from a lower quality sensor at the same price point.
We embed imagery details from a higher resolution image in a deep neural network (DNN) and extract the details to enhance geographically similar imagery. As part of this research, we develop a novel architecture for DNNs using perturbative layers that is well-suited to the enhancement task.
Super-Resolution
There are many forms of image enhancement, including noise-reduction and color adjustments. For satellite imagery, one common measure of image quality is Ground Sampling Distance (GSD), the physical measure represented by one pixel in the image. Enhancement in this post refers to decreasing (better) the GSD in satellite imagery, also called super-resolution. The super-resolution process synthesizes sub-pixel information in imagery to increase the resolution of the image. Typical synthesis techniques include:
- interpolation of nearby pixels within the image,
- interpolation of nearby frames within a video,
- frequency filtering, to reduce noise.
In this investigation, we extend these techniques to include:
- deep learning structures from geographically relevant imagery.
To quantify the effectiveness of our enhancement techniques, we compare the Peak Signal-to-Noise Ratio (PSNR) before the enhancement to the PSNR after the enhancement. Moreover, we show the geographic distribution of PSNR over the image and its relevance to further analysis.
PSNR is a natural choice to measure the generative ability of the super-resolution algorithm. We plan to present a future post on using a Generative-Adversarial Network to learn a better cost function for performing super-resolution.
Fully Convolutional Neural Networks with Perturbation Layers
Before going directly to the results, we take a detour to discuss the architecture developed to perform the super-resolution process. Standard DNNs such as AlexNet, ResNet, VGG, and GoogLeNet are great architectures for image classification and object detection on low resolution imagery but are ill-suited for the exponentially large output space in super-resolution.
Inspired by ResNet, we decided to design a new DNN as a sequence of perturbations of the identity map, since super-resolution is essentially a perturbation of the lower resolution image. The network is extended a layer at a time by optimizing a convex combination of the previous layer and the current layer, producing a trainable weight (bypass parameter) for the new layer that measures its contribution to the final output.
There are several benefits of this structure:
- The network architecture fits well into modern training strategies for training extremely deep neural networks including skip connections and stochastic depth.
- The bypass parameters measure the contribution for each layer, giving feedback on how deep the network should be.
- Each layer performs a near-identity transformation that enhances the image using different structures.
Within each perturbation layer, we include at least two convolutional layers and a non-linear ReLU layer between each convolutional layer. More convolutional layers within a perturbative layer increases the ability of the perturbative layer to enhance the image but becomes more difficult for training to converge. Alternatively, additional perturbative layers have similar enhancement potential without the convergence issues.
The bypass parameters give direct feedback on the impact of each perturbative layer. This feedback helps answer the question of how deep the network has to be.
The Experiment
The initial experiment that we perform measures the ability for the DNN to enhance degraded, 3-band GeoTIFFs over the Panama Canal. We use two GeoTIFFs (very large satellite images) courtesy of DigitalGlobe in the experiment: one for training and one for testing. We do not enhance the entire image through one pass of the DNN, but rather we enhance a 27-pixel by 27-pixel region at a time. Since the GeoTIFFs are very large images, sampling 27-pixel by 27-pixel regions provides sufficient training data for our DNN. Access to more training imagery should improve the results. Using the two GeoTIFFs, we procede with training the DNN:
- The two GeoTIFFs are rescaled to effectively reduce the resolution of the images.
- Regions from the first GeoTIFF are randomly sampled to train the DNN a layer at a time. We train the weights of the DNN to maximize the PSNR of the output of the DNN.
- The DNN is used to enhance both degraded GeoTIFFs.
- Results are compared to interpolation based enhancement algorithms.
We use the TensorFlow framework to construct, train, and infer the DNN on a 2015 Nvidia Devbox with 4 Titan X GPUs, but we used only one GPU to train the DNN. To train the neural network we used the ADAM optimization algorithm; ADAM has associated parameters that impact training time and convergence rates. We did not fully explore the optimal choice of the ADAM parameters, but spent about 12 hours (on one Titan X GPU) training per perturbative layer. The rate at which the bypass parameters converged ( as shown in Figure 5) assisted in our choice of the ADAM parameters and subsequently the training time.
Results
In this experiment, we have two GeoTIFF images around the Panama Canal, one for training and one for testing.
The first step is to create training data by degrading a GeoTIFF. By resizing the GeoTIFF, the resulting degraded image has an effective reduction of GSD, or resolution. Using linear interpolation as a starting point, we can plot the distribution of PSNR throughout the degraded image.
Figure 7 demonstrates that one number to represent PSNR is insufficient to describe the noise within a satellite image. Regions with more structure, such as boats, have lower PSNR in the degraded image than regions with less structure, such as water. When we train the super-resolution algorithm to enhance the degraded image, we want to enhance regions that we care about, which are often regions with structure.
The results in Figure 9 are evidence that the DNN-based enhancement has improved performance in the regions with more structure. Even though the test image and the training image had the same GSD, different atmospheric conditions and cloud coverage impact the enhancement, partially explaining the improved performance on the testing image over the performance on the training image. Image clarity also affects the ability to label regions containing boats; less accurate labeling includes more water regions and would likely lower the dB gain in that region. Experiments that isolate these phenomena are beyond the scope of this post.
Alternative Research Directions
There are examples, such as SRCNN, of super-resolution performed on non-satellite images that demonstrate similar dB gains when trained on ImageNet. These approaches may have viability in the enhancement of satellite imagery but lack a fundamental advantage of our approach: location information of the image. Our approach is different for several other reasons:
- Satellite imagery tends to be a corner case for many DNN-based machine learning algorithms.
- Over-training may not be as detrimental to our algorithm as it is for a more diverse image set.
- Perturbative layers provide insight into the required depth of the DNN and the marginal performance improvement expected by increasing the depth.
- GeoTIFFs have the potential to contain more than just Red, Green, and Blue channels. Our approach is easily modified to take advantage of additional channels (such as 8-band imagery).
Finally, we have experimented with increasing the number of convolutional layers within each perturbation layer with improved performance. We will present this in part two with a specific focus on 8-band images and SpaceNet.