More accurate and flexible cloud masking for Sentinel-2 images

Google Earth
Google Earth and Earth Engine
4 min readNov 9, 2020

By Justin Braaten, Kurt Schwehr, and Simon Ilyushchenko on behalf of the Earth Engine Data team

Clouds are wonderful to look at if you’re taking a leisurely walk and staring at the sky. But if you’re studying conditions on the ground using satellite imagery, clouds tend to get in the way of your work. They can interfere with machine learning tools you’re using to identify everything from fields to forests and prevent you from easily assembling cloud-free composites.

To help solve this problem when using Sentinel-2 imagery in Earth Engine, we have rolled out a new collection of cloud probability images calculated with the s2cloudless algorithm. By uniting Sentinel Hub’s algorithm with Google’s computing resources, we have calculated per-pixel cloud probability for the entire Sentinel-2 archive at 10 m scale; each new image added to the Earth Engine catalog is also accompanied by an s2cloudless image. The s2cloudless dataset provides a flexible method to accurately mask cloudy pixels in Level 1C (TOA) and 2A (SR) imagery for generating cloud-free composites and running classification procedures.

Cloud masking is an essential step in our work to map the world; the s2cloudless dataset is making it easier for us to include Sentinel-2 in these efforts. We hope you’ll find it just as beneficial in your own work. Read on to learn more about s2cloudless and how you can start working with it.

Sentinel-2 image from central Oregon, USA showing a variety of cloud cover types [top] | Corresponding s2cloudless image showing cloud probability on a gradient from 0 (black) to 100 (white) percent [bottom].

Cloud masks with greater flexibility

Cloud masking using the Sentinel-2 quality assurance band (QA60) has always been an option, but the advent of s2cloudless has provided scientists with the ability to fine-tune cloud masking, leading to more confidence in the final analysis. As the QA60 band is just a binary classifier for thick clouds and cirrus clouds, there is no way to fine-tune what is considered cloud or not. The s2cloudless image provides a cloud presence probability between 0 and 100 percent that you can use to customize the aggressiveness of your cloud masking procedure. There is always a balancing act between commission and omission errors, but with s2cloudless you have the ability to optimize masking to suit your project’s unique needs.

For instance, consider the following figure where the QA60 mask omits a considerable amount of cloudy pixels, but using the s2cloudless image we can set the probability threshold at 50% to capture most of the cloudy pixels.

Demonstration of the difference between s2cloudless and the QA60 band. Yellow is s2cloudless, red is the intersection of s2cloudless and the QA60 band. Notice that in this case the QA60 band has considerable cloud pixel omission error.

With great power comes great responsibility

The s2cloudless dataset gives you the power to select a cloud probably threshold for defining cloud/non-cloud masks. It is important to choose the threshold value carefully, as the optimum value can vary by cloud type, cover type, location, etc. We recommend testing a few values on a sample of images in your study region to get a sense for the probability distribution and sensitivity to change. The following example demonstrates using 10% and 90% cloud probability as the mask threshold. Notice that at 90% thin clouds are omitted, and at 10% they are identified.

Difference between 10% (blue) and 90% (red) cloud probability threshold masking.

Another useful method for identifying a threshold value is interpretation of the cloud probability histogram. A reasonable threshold value to balance cloud commission and omission error is the center between the two peaks of high frequency, making up the majority of non-cloud and cloud pixels.

Cloud probability histogram for the image region above. Note the bimodal distribution; selecting a threshold value near the bottom of the trough created by the two points of high-frequency non-cloud and cloud probabilities will help balance cloud commission and omission error.

The value selected as the cloud/non-cloud threshold can impact cloud-free composite building and classification work. Overly aggressive cloud masking will ensure that clouds are completely removed, but at the potential expense of removing clear pixels as well. On the other hand, a threshold value that is too conservative may retain cloudy pixels that can contaminate composites and confuse training/classification operations. If you have a lot of images that can be composited, you might error on aggressive cloud masking. If you have few images, you might tend toward a moderate threshold value or use image-specific values to ensure the highest accuracy.

Working with s2cloudless

To help get you started exploring and applying the new s2cloudless image, we’ve put together a tutorial that demonstrates joining the Sentinel-2 SR and s2cloudless collections, defining a cloud masking function, applying it to a sub-collection, and displaying the results. Bonus: it includes cloud shadow masking!

Earth Engine Colab notebook on using the new s2cloudless image for cloud and cloud shadow masking Sentinel-2 imagery.

We’ve never had cloud masking for Sentinel-2 that is this comprehensive and customizable for Earth Engine users. We hope it will inspire you to power ahead with new research, knowing you can conduct cloud masking more accurately and in less time!

--

--