Sentinel-2 Deep Resolution 2.0

Yosef Akhtman
11 min readDec 16, 2022

--

S2DR2: Effective 12-Band 10x Single Image Super-Resolution for Sentinel-2

This article documents the performance of S2DR2: 12-band 10x Single-Image Super-Resolution model that upscales all 12 spectral bands of a Sentinel-2 L2A (or L1C) scene from the original 10, 20 and 60 m/px spatial resolution to the target 1 m/px.

NOT PAN-SHARPENING! All output data is derived from a single Sentinel-2 L2A scene and no additional high-resolution input data was utilised in the upscaling process.

Deep Learning-based super-resolution has been a subject of intense research over the past few years with multiple methodologies and projects demonstrating truly remarkable results. The Earth Observation (EO) community has - so far - met these technological advances with a fare dose of scepticism and reserve, as was clearly demonstrated by the heated discussion during the Super-Resolution in EO: hype or hope session of the recent ESA Open Earth Forum. The main dilemma in question is whether the reconstructed spatial details in a super-resolved image can be considered as real information, or just a decorative pleasantry hallucinated by the algorithm. On the one hand, the objective of the majority of SR methods in other domains is indeed mostly decorative. On the other hand, SR has recently become a routine part of the commercial offerings by major Imaging Satellite operators, including the industry leaders MAXAR and Airbus, that are marketed for mission-critical security and defence applications.

Correspondingly, the aim of this post is to establish S2DR2 as a viable source of high-revisit rate, radiometrically calibrated, high-resolution satellite imaging data that is suitable for practical analytical applications. We maintain that some of the traditional understanding of such concepts as spatial resolution and sampling frequency remain poorly understood and indeed a significant and important information can be added to the original satellite image through the process of non-linear enhancement such as Single Image Super-Resolution (SISR). We leave the conceptual and information-theoretic justification regarding the source of the new information in the super-resolved image to a separate discussion and hereby focus on the actual performance of the current model.

In order to address some of the surprising controversy associated with the concept of super-resolution, particularly in the Earth Observation community, I would like to express my deepest respect for the genius, seminal work of Claude Shannon and Harry Nyquist and reassure the purist readers that no fundamental laws of mathematics were violated and no entropy was harmed during the writing of this article.

Our objective is somewhat complicated by the fact that in most real-world imaging scenarios and in EO domain in particular, real ground truth for super-resolved imaging data does not exist. The evaluation of such SR algorithms is therefor limited to mostly qualitative rather than quantitative methods. Unfortunately, at this time, we don’t offer any conclusive solution to this problem, but rather attempt to provide a combination of extensive empirical qualitative and best-effort quantitative evidence to support our results.

For the sake of ground truth (GT) we hence forth utilise small samples of 8-bit RGB imaging data exported from Google Earth under the fair use provision. GT samples are histogram-matched to the corresponding S2DR2 images. It is important to note that GT and S2DR2 samples are not expected to match perfectly, as there are temporal (date and time of day), morphological (georeferencing and orthorectification), as well as semantic (objects may move and their appearance change) differences between the image pairs. These differences, unfortunately, impose hard limitations on the precision of the presented quantitative analysis.

Fig 1. Visual assessment of the S2DR2 spatial resolution. © ESA © Google Earth

Firstly, we would like to substantiate the understandably ambitious claim of the target super-resolved spatial resolution of 1m/px. Clearly, the definition of spatial resolution here differs from its traditional meaning, as determined by the physical properties of the optical sensor being used to collect the image. Rather, our definition is motivated by the information content of the processed image and its corresponding practical utility. In other words we are trying to demonstrate that the generated 1 m/px data is useful and significant rather than being necessarily 100% accurate. Our claim of 1m/px effective spatial resolution is corroborated by the examples depicted in Figures 1 and 2, where we compare the original Sentinel-2 10 m/px data (left-most column) and the corresponding S2DR2 images resampled to the spatial resolutions of 4, 2, and the original 1 m/px. The corresponding ground truth 1m/px image is also shown in the right-most column for reference.

Comparing columns 3 and 4 of Figure 1 and 2, it is evident that any downsampling of the S2DR2 1m/px image results in a significant loss of spatial resolution and the corresponding information content. From the practical stand point, counting individual trees - for example - in a 1m/px S2DR2 image (Figure 1, column 4) is relatively straightforward task, while being very challenging in the 2m/px image (column 3) and completely impossible in the 4m/px image (column 2). It should be noted however that the S2DR2 is a latent space (as opposed to pixel space) approximation of the GT. In the particular case of Figure 1, the model is successful at the reconstruction of the periodic texture, but not necessarily the exact tree density and corresponding tree count. In Figure 1, row 4, you may notice that the number of trees in S2DR2 (column 4) is slightly lower than that of GT (column 5). Likewise, not all gaps (missing trees) are accurately reconstructed.

Fig 2. Visual assessment of the S2DR2 spatial resolution. © ESA © Google Earth

Here we would like to foresee and address another common line of criticism, namely that any analytical product that can be derived from our S2DR2 image, such as the field boundaries or the tree count, can be similarly derived directly from the S2 image, since, supposedly, all of the information comes from the original S2 data anyway. This line of reasoning is quite popular and is somewhat true in principle, however completely wrong in many important practical cases for two main reasons.

Firstly, the spatial information in the S2DR2 image comes from a combination of the original S2 multi-spectral image and the very strong prior provided by the S2DR2 model. S2 image is a code and S2DR2 model is an interpretation code-book. A model that bypasses the SR step to directly yield an analytical product would need to have the aforementioned code-book embedded in it. This is true for most target objects on the scale of 30 meters or less, and is certainly true for sub-10 meter objects. In other words one would need to train a model that would contain SR and analytical steps in one model. This will only make such model more limited in scope and less predictable, but not necessarily smaller or more effective in any way.

Secondly, the training data for the development of the direct S2-to-analytics models is not readily available, or too expensive, in most cases. To develop such model one would need a substantial amount of high-resolution ground truth that perfectly matched to the corresponding S2 scenes. Such datasets are rare, or non existent for most applications. Conversely, S2DR2 allows for the labelling of small objects that are perfectly matched to their S2 originals, and the subsequent training of the dedicated analytical models.

Fig 3. Visual assessment of the S2DR2 spatial resolution. © ESA © Google Earth

Furthermore, comparing S2DR2 (column 4) and GT (column 5) of Figure 2, we can see that the correct shape and orientation of individual planes is accurately reconstructed even for small planes that are only 20m (two S2 pixels) across despite being completely incomprehensible in the original S2 image (column 1). Nevertheless, small reconstructed objects are somewhat deformed and exact spectral characteristics of the plane fuselages are not always accurately reproduced.

Figure 3 depicts additional examples of reconstructed spatial details, including single pixel and even sub-meter features, such as the road markings that can be seen in row 2, columns 5 (S2DR2) and 6 (GT).

Ultimately S2DR2 is a predictive model. It has an expected accuracy and a margin of error. Its performance may vary for locations and classes of land cover/objects that are better or worse represented in the training data. Various imperfections, artefacts and distortion can be expected in the processed data. Whether this particular model maybe suitable for a specific application is a question of many parameters and use case requirement. Nevertheless, we are convinced that the presented model can be highly beneficial for a broad scope of practical analytical application starting with the best-in-class up-to-date field boundary detection provided by DigiFarm.

The state of the commercial satellite imaging industry remains challenging despite its great potential and significant investments. Practical commercial applications outside the domain of security and defence are surprisingly few and far between. In this context S2DR2 demonstrates the feasibility of expanding the scope of capabilities for imaging satellite systems that are already in orbit, as well as creating novel design considerations for future systems. We will continue to document new applications of S2DR2 in our force-coming posts.

We are convinced that the 5+ year uninterrupted global coverage, 5 day revisit rate, as well as unparalleled quality of the Sentinel-2 radiometric calibration and atmospheric correction, makes S2DR2 a unique EO asset of tremendous scientific and applied value.

An extended collection of additional S2DR2 examples can be found here. For more information please contact ya at gamma dot earth

Performance Evaluation

The main difficulty with the objective performance evaluation of such model is constituted by the fact that real ground truth (GT) that is temporally, geometrically and spectrally coherent with the original Sentinel-2 L2A image does not exists. In order to calculate the performance metrics we therefor utilised small samples of high-resolution satellite imagery obtained from Google Earth. These samples contain 8-bit RGB pixel values that allow for evaluation of performance metrics for spatial, but not the spectral features.

Correspondingly, we evaluate two separate aspects of the achievable performance. Firstly, the accuracy of the spatial reconstruction is evaluated across the RGB bands using high-resolution ground truth, and RMSE, PSNR and SSIM metrics. Secondly, the preservation of spectral characteristics across the 12 spectral bands of Sentinel-2 is evaluated using the score between the original and upscaled pixels. The main purpose of this step is to ensure that the model does not introduce spectral distortion or biases into the original spectral data.

It should be noted that the proposed evaluation methodology is far from perfect. The ground truths can be significantly different from the Sentinel-2 image as it was taken on a different date, or different time of day, as well as a different angle. Thus no perfect match between the super-resolved image and the ground truth can be expected. As is often the case with super-resolution models the evaluation involves subjective manual inspection and assessment of the results.

We hence force present a comprehensive collection of examples generated across a broad range of locations, seasons and types of terrain that demonstrate and document the achievable performance, as well as the expected limitations of the model.

Each example contains the following:

  • Sentinel-2 True Color Image (TCI:B04,B03,B02) 40x40px 10m/px(top, left);
  • S2DR2 TCI 400x400px 1m/px (top, centre);
  • Ground Truth RGB 1m/px (top, right © Google Earth);
  • Sentinel-2 Infra-Red Pseudo-Color Image (IRPCI:B05,B08,B11) 40x40px 10m/px(bottom, left);
  • S2DR2 IRPCI 400x400px 1m/px (bottom, centre);
  • Scatter plot of pixel values across all 12 bands, Sentinel-2 L2A versus S2DR2. Different colors in the scatter plot represent different spectral bands. Furthermore, the plot contains accuracy evaluation values, including metrics RMSE, PSNR, SSIM and R2

The overall average values of the performance metrics across all 10 evaluation datasets are: RMSE: 0.015; PSNR: 25 dB; SSIM: 0.85; R2: 0.8

Barispol dataset

RMSE: 0.0169; PSNR: 24.78 dB; SSIM: 0.85; R2: 0.79

Buchillon dataset

RMSE: 0.025, PSNR: 22.972 dB, SSIM: 0.775, R2: 0.82

Fresno dataset

RMSE: 0.029, PSNR: 20.809 dB, SSIM: 0.657, R2: 0.84

Kingsman dataset

RMSE: 0.042, PSNR: 23.431 dB, SSIM: 0.68, R2: 0.77

Giza dataset

RMSE: 0.043, PSNR: 22.827 dB, SSIM: 0.66, R2: 0.74

Ampitinova dataset

RMSE: 0.025, PSNR: 21.778 dB, SSIM: 0.68, R2: 0.82

--

--