Super-Resolution on Satellite Imagery using Deep Learning, Part 1

Published in

The DownLinQ

7 min readNov 18, 2016

Yann Lecun has compared supervised learning to the icing on a cake, and unsupervised learning as the cake, asserting that we know how to make the icing but do not know how to make the cake. In this post, we provide a “cake” recipe for training an unsupervised learning algorithm to enhance satellite imagery.

This research is motivated by the increase in access to lower cost satellite imagery in the emerging commercial space industry. In this emerging industry, there is a trade-off amongst sensor quality, revisit rate, and price. We investigate advanced image processing’s potential to reduce the trade-off and to improve the imagery from a lower quality sensor at the same price point.

Figure 1: Remote Sensing via Airplane, Commercial Exquisite, and Space 3.0. This figure is not to scale but it meant to convey the potential overlap of remote sensing activities. Aerial remote sensing can be used to enhance commercially exquisite satellite imagery. Commercially exquisite satellite imagery can be used to enhance lower-resolution satellite imagery.

We embed imagery details from a higher resolution image in a deep neural network (DNN) and extract the details to enhance geographically similar imagery. As part of this research, we develop a novel architecture for DNNs using perturbative layers that is well-suited to the enhancement task.

Super-Resolution

There are many forms of image enhancement, including noise-reduction and color adjustments. For satellite imagery, one common measure of image quality is Ground Sampling Distance (GSD), the physical measure represented by one pixel in the image. Enhancement in this post refers to decreasing (better) the GSD in satellite imagery, also called super-resolution. The super-resolution process synthesizes sub-pixel information in imagery to increase the resolution of the image. Typical synthesis techniques include:

interpolation of nearby pixels within the image,
interpolation of nearby frames within a video,
frequency filtering, to reduce noise.

In this investigation, we extend these techniques to include:

deep learning structures from geographically relevant imagery.

Figure 2: Super-Resolution. To transform super-resolution from an ill-posed optimization problem into a well-posed inverse problem, we must start with higher resolution imagery, degrade that imagery, and optimize the super-resolution algorithm to reconstuct the original imagery from the degraded imagery. Peak Signal-to-Noise ratio measures the difference between the original image and the reconstructed one.

To quantify the effectiveness of our enhancement techniques, we compare the Peak Signal-to-Noise Ratio (PSNR) before the enhancement to the PSNR after the enhancement. Moreover, we show the geographic distribution of PSNR over the image and its relevance to further analysis.

PSNR is a natural choice to measure the generative ability of the super-resolution algorithm. We plan to present a future post on using a Generative-Adversarial Network to learn a better cost function for performing super-resolution.

Fully Convolutional Neural Networks with Perturbation Layers

Before going directly to the results, we take a detour to discuss the architecture developed to perform the super-resolution process. Standard DNNs such as AlexNet, ResNet, VGG, and GoogLeNet are great architectures for image classification and object detection on low resolution imagery but are ill-suited for the exponentially large output space in super-resolution.

Inspired by ResNet, we decided to design a new DNN as a sequence of perturbations of the identity map, since super-resolution is essentially a perturbation of the lower resolution image. The network is extended a layer at a time by optimizing a convex combination of the previous layer and the current layer, producing a trainable weight (bypass parameter) for the new layer that measures its contribution to the final output.

Figure 3: Our convex perturbation layer compared to a ResNet Layer. In both architectures there is a combination of a convolutional layer with an identity function. The convex perturbation allows for one to train to the optimal combination. As the beta values decrease, the layer’s contribution to the enhancement decreases.

There are several benefits of this structure:

The network architecture fits well into modern training strategies for training extremely deep neural networks including skip connections and stochastic depth.
The bypass parameters measure the contribution for each layer, giving feedback on how deep the network should be.
Each layer performs a near-identity transformation that enhances the image using different structures.

Within each perturbation layer, we include at least two convolutional layers and a non-linear ReLU layer between each convolutional layer. More convolutional layers within a perturbative layer increases the ability of the perturbative layer to enhance the image but becomes more difficult for training to converge. Alternatively, additional perturbative layers have similar enhancement potential without the convergence issues.

Figure 4: A Deep Neural Network with perturbative layers.

The bypass parameters give direct feedback on the impact of each perturbative layer. This feedback helps answer the question of how deep the network has to be.

Figure 5: Bypass Parameters during training. The weights of the bypass parameters are plotted during the training process. For this particular training algorithm, training occurred in two stages per layer: firstly the parameters of the layer were trained and secondly all previous trained parameters were jointly optimized with the new layer. The bypass parameter decreases as the network grows. Eventually the impact of the new layer will not affect the integral value of the pixels in an enchanced image (without being aggregated with other layers) — this defines a sub-pixel threshold.

The Experiment

The initial experiment that we perform measures the ability for the DNN to enhance degraded, 3-band GeoTIFFs over the Panama Canal. We use two GeoTIFFs (very large satellite images) courtesy of DigitalGlobe in the experiment: one for training and one for testing. We do not enhance the entire image through one pass of the DNN, but rather we enhance a 27-pixel by 27-pixel region at a time. Since the GeoTIFFs are very large images, sampling 27-pixel by 27-pixel regions provides sufficient training data for our DNN. Access to more training imagery should improve the results. Using the two GeoTIFFs, we procede with training the DNN:

The two GeoTIFFs are rescaled to effectively reduce the resolution of the images.
Regions from the first GeoTIFF are randomly sampled to train the DNN a layer at a time. We train the weights of the DNN to maximize the PSNR of the output of the DNN.
The DNN is used to enhance both degraded GeoTIFFs.
Results are compared to interpolation based enhancement algorithms.

We use the TensorFlow framework to construct, train, and infer the DNN on a 2015 Nvidia Devbox with 4 Titan X GPUs, but we used only one GPU to train the DNN. To train the neural network we used the ADAM optimization algorithm; ADAM has associated parameters that impact training time and convergence rates. We did not fully explore the optimal choice of the ADAM parameters, but spent about 12 hours (on one Titan X GPU) training per perturbative layer. The rate at which the bypass parameters converged ( as shown in Figure 5) assisted in our choice of the ADAM parameters and subsequently the training time.

Results

In this experiment, we have two GeoTIFF images around the Panama Canal, one for training and one for testing.

Figure 6: Satellite Image of the Panama Canal. This is the original training image for the DNN.

The first step is to create training data by degrading a GeoTIFF. By resizing the GeoTIFF, the resulting degraded image has an effective reduction of GSD, or resolution. Using linear interpolation as a starting point, we can plot the distribution of PSNR throughout the degraded image.

Figure 7: The Distribution of PSNR in the Input to the DNN. The input to the DNN is a degraded satellite image that is resized (by a factor of 2 using linear interpolation) to match the dimensions of the original GeoTIFF. This plot shows the location of noise introduced by the degrading process. Blue regions have more noise introduced by the degrading process, while red regions have less. The blue regions tend to be areas with fine structure (like boats), while red regions tend to have more coarse features (like open water).

Figure 7 demonstrates that one number to represent PSNR is insufficient to describe the noise within a satellite image. Regions with more structure, such as boats, have lower PSNR in the degraded image than regions with less structure, such as water. When we train the super-resolution algorithm to enhance the degraded image, we want to enhance regions that we care about, which are often regions with structure.

Figure 8: The PSNR Gain after Enhancement using the DNN. The distribution of the PSNR gain is plotted for the test image — the DNN was not trained on this image. Most regions benefited from enhancement. The blue regions are general regions where there was significantly less noise in the original image. The enhancement is the PSNR compared against the initial linear interpolation.

Figure 8: PSNR Gain versus Bicubic Interpolation. On the test GeoTIFF, we plot the difference in PSNR compared to bicubic interpolation. Regions with initially more noise also benefit.

Figure 9: PSNR Change in dB from Linear and Bicubic Interpolation to DNN-based Enhancement. The PSNR change is computed over the entire GeoTIFFs and over sub-regions of the GeoTIFFs that contains boats. The enhancement in the regions with structure is significantly higher than in the water regions.

The results in Figure 9 are evidence that the DNN-based enhancement has improved performance in the regions with more structure. Even though the test image and the training image had the same GSD, different atmospheric conditions and cloud coverage impact the enhancement, partially explaining the improved performance on the testing image over the performance on the training image. Image clarity also affects the ability to label regions containing boats; less accurate labeling includes more water regions and would likely lower the dB gain in that region. Experiments that isolate these phenomena are beyond the scope of this post.

Figure 10: Example of Enhancement of Boat plus Water region. This figure shows the enhancement of a degraded boat. Since the region contains a large portion of water the PSNR is lower than a region containing just a boat.

Alternative Research Directions

There are examples, such as SRCNN, of super-resolution performed on non-satellite images that demonstrate similar dB gains when trained on ImageNet. These approaches may have viability in the enhancement of satellite imagery but lack a fundamental advantage of our approach: location information of the image. Our approach is different for several other reasons:

Satellite imagery tends to be a corner case for many DNN-based machine learning algorithms.
Over-training may not be as detrimental to our algorithm as it is for a more diverse image set.
Perturbative layers provide insight into the required depth of the DNN and the marginal performance improvement expected by increasing the depth.
GeoTIFFs have the potential to contain more than just Red, Green, and Blue channels. Our approach is easily modified to take advantage of additional channels (such as 8-band imagery).

Finally, we have experimented with increasing the number of convolutional layers within each perturbation layer with improved performance. We will present this in part two with a specific focus on 8-band images and SpaceNet.