Courtesy NASA

Obscured by Clouds: A Comparative Analysis of Cloud Removal Methods

David Nagy
Feb 4 · 17 min read

This is the third post in a series documenting the design, development and implementation of a machine learning pipeline leveraging satellite imagery to predict drivers of deforestation in the Congo Basin rainforest. For more context, please read the first two articles here and here.

(co-authored by Zhenya Warshavsky)

Project Canopy is a new nonprofit aiming to bolster evidence-based policymaking by identifying where environmental actors can obtain the greatest returns on their conservation investments. One way we aim to do this is by developing a tool that allows users to track, in nearly real-time, logging activity in the Congo Basin. With this tool, users could quickly identify where primary forests are being cut down, and thus adopt remedial measures.

As data scientists at this organization, we decided to train a model to detect logging road locations; please see our first article for details. As explained in our second article, after trying out several alternatives, we decided to use Sentinel Hub as our main platform both for querying Sentinel-2 image products and downloading them. However, shortly before we actually implemented the data collection pipeline we had developed, we ran across some information that caused us to significantly modify our approach.

We encountered an academic article by Michael Schmitt, Lloyd Hughes, Chunping Qiu, and Xiao Xiang Zhu that detailed improvements to the way we had planned to obtain high-quality, (relatively) cloud-free Sentinel-2 images. Moreover, their method used Google Earth Engine, not Sentinel Hub. After spending two months learning and further improving upon this method, we decided it served our purposes better than any other cloud removal method we tried. It managed to provide high-quality cloud-free images without requiring a large pool of image products to draw from, or creating “fake data” using deep-learning models or the like. Our previous post corrects the misconceptions we had about Google Earth Engine; this post details both our old method and our new one, describes the reasons why we switched, and compares them to the other major cloud removal procedures out there.

Cloud Masking

The average monthly cloud coverage over the Congo Basin rainforest in 2020

Until now, we were utilizing cloud masks. With cloud masking, an algorithm is applied to each satellite image in order to detect the probability of each pixel being a cloud. You can then set a threshold value — e.g., any pixel with at least a 50% chance of being a cloud counts as a “cloudy pixel.” The result is a matrix of 1s and 0s (0 for cloud pixels, 1 for non-cloud pixels) of the same shape as the original image. This is the “cloud mask.” By applying this cloud mask, multiplying each pixel value in the image by its equivalent in the mask, we can convert (likely) cloudy pixels to “no data” areas. This may not seem like an improvement, but we then mosaic together multiple images. Typically, mosaicking is used to combine several small images into a single big one. But what happens when you combine several images that cover the same area into a single, larger image?

You can think of this process as “stacking” multiple images on top of each other. Imagine that each image is a piece of paper. You have ten images, selected however you want (e.g., across a certain time range, and/or sorted by image quality) and you place them all in one stack, one on top of the other. But each paper has holes in it, corresponding to the image’s “no data” areas. So any holes in the top paper will be “filled in” by the paper directly underneath it.

The hole in the top image is “filled in” by the image below it

Because the top two images may still have overlapping “no data” areas, these gaps may be filled in by the third piece of paper, and so on. By masking out the cloudy areas, we enable those areas to progressively be “filled in” by other images. If we then compress this stack into a single piece of paper, we should be left with a single “mosaicked” image. Masking the clouds out of a bunch of images and combining them all into a single mosaic ideally results in a single, cloud-free image.

Types of Masks

There are three major cloud masking algorithms for Sentinel-2 images, and Google Earth Engine has recently introduced a fourth one. First is sen2cor, the algorithm used by the European Space Agency (ESA), the organization that operates the Sentinel-2 satellites. They apply this algorithm to all the image products they collect and then provide the resulting masks with those products. We used these at first, simply because the sen2cor masks are easy to access and comprehensive.

The other two major ones are MAJA, used by the French Space Agency (CNES), and FMask, used by the US Geological Survey (USGS). While Baetens et al argue that these are superior to sen2cor, when we investigated USGS, we found their product offerings to be far less comprehensive than ESA’s, missing large time windows and geographical regions. In addition, GIS-Blog asserts that FMask is designed more for Landsat than for Sentinel. Those weaknesses made it unsuitable for our purposes, but if they do not apply to your use cases, these alternatives are perhaps worth investigating.

Finally, Google Earth Engine has recently released cloud masks derived from Sentinel Hub’s S2cloudless algorithm. We tested this ourselves, and found it to be generally superior to the sen2cor results. So if you want to use cloud masking, and are working with Google Earth Engine, we definitely recommend using the S2cloudless masks.

However, S2cloudless suffers from the same fundamental issues as all cloud masking approaches, which we will cover next.

Drawbacks

While cloud masking is a theoretically solid idea and works up to an extent, there were several drawbacks when we actually implemented it. (The below two images both result from applying the sen2cor mask.)

First, and most importantly, we often ran into a “pixelation” effect. Because we were “filling in” the holes of one picture with another, when those pictures looked different (as a result of different timestamps or other factors), this made the final image lack resolution or have randomly scrambled pixels.

“Pixelation” effect and partial cloud removal using the sen2cor cloud mask

Second, and more challenging to address, was a “stitching” effect that made the images look patched together. This, along with some other “artefacts” of the mask/mosaic process, such as the pixelation described above, meant that we often ended up with a clearly non-contiguous image.

“Stitching” visible as the top part seems to clearly come from a different image. “Pixelation” effect can be seen in the bottom right corner (using the sen2cor cloud mask)

Third, no cloud detection algorithm is perfect. Sometimes clouds that the algorithm didn’t detect remain in images, and sometimes the algorithm mistakes valuable parts of the image as clouds, causing us to lose them. This is obviously unavoidable, and the algorithms are continually improving: the S2cloudless algorithm is superior to sen2cor in this respect, for example. But since cloud masking inherently involves removing data, you are always at risk of losing something valuable. This is particularly important for us because we’re interested in retaining tiny parts of the images, namely the logging roads themselves.

To sum up, then, here are the upsides of cloud masking:

  • The resulting images almost never have clouds in them
  • The cloud masks are easy to access and relatively straightforward to use

On the other hand, it also has these drawbacks:

  • Often results in “artefacts,” such as a “pixelation” or “stitching” effect, that makes the resulting image look bad
  • The masking algorithm often makes errors, leaving in clouds or masking out areas that contain useful data

While these drawbacks were not significant enough to derail the project, they did mean we were on the lookout for a better way — and we found one in the article by Schmitt et al.

A Better Way: Cloudfree Merging

Made by Project Canopy, based on the flowchart from Schmitt et al
  1. Querying — Obtaining images in a specific area from a specified timeframe
  2. Quality Score — Compute a “quality score” for each pixel based on its likelihood of being a cloud or cloud shadow
  3. Merging — Use the quality score to merge together all the images into a single, cloud-free one

Querying

For an explanation of how to negotiate the apparent limitations on querying and downloading images in Google Earth Engine, see our previous post.

Quality Score

After getting the products, the next step is calculating a “quality score” for each pixel. Essentially, this quality score ranks the likelihood that the pixel contains useful information, as opposed to being either a cloud or the shadow of a cloud. In this regard, it’s a cloud mask like the ones we discussed earlier. But there are at least two key differences between this and the sen2cor cloud mask provided by the ESA.

First, the sen2cor and S2cloudless masks do not differentiate between clouds that humans can see through and clouds that they can’t. As Schmitt et al say, even images with extensive light cirrus clouds “are still largely visible with only mild haziness in some areas.” While this haze makes the image less visually appealing, it still preserves the important terrain information. Since we are interested in accurately capturing terrain features to feed into a machine learning model, even a completely hazy (yet still visible) image is preferable to a sharp image with some regions blotted by opaque clouds.

Second, the formula used by Schmitt et al recognizes that clouds are contiguous. Clouds are almost never just a small group of pixels that are cloudy (with the surrounding pixels clear) or vice versa. To remove this noise, it applies morphological opening and closing.

After calculating this quality score, Schmitt et al then use it to create a single “Quality Pixel Percentage” number for each entire image in the collection. They select a threshold number for their quality score: below that threshold, the pixel counts as a “bad pixel” (i.e., a cloud or a cloud shadow), while it’s a “good pixel” if it’s above the threshold. The percentage of pixels that are “good” constitute the entire image’s Quality Pixel Percentage. This single value is used in the merging process.

“The cloud-free image generation process. (a) The original image from the collection with the least cloud cover for Jacksonville, Florida in winter, (b) computed cloud score for the image, with a color scale from blue (low cloud probability) to red (high cloud probability), (c) cloud and shadow mask computed by thresholding the Quality Score with green representing the cloud contribution and blue the shadow, and (d) the final cloud-free image produced for the scene.” (image and caption from Schmitt et al)

Merging

At this point, we have an image collection, a Quality Pixel Percentage value for each image in that collection, and a Quality Score for each pixel in all the images. Both of these values are used to combine the entire collection into a single cloud-free image.

Made by Project Canopy, based on the flowchart from Schmitt et al

First, we select all the images with a Quality Pixel Percentage of 95% or more. Any such images are considered “good enough” to use independently, and it’s preferable to keep them whole rather than introduce noise and image artefacts by combining multiple images together, even if doing so would achieve marginal improvement of a few percentage points. This “best” collection is then sorted, so that the least cloudy images are on top, and then mosaicked together so that any missing areas in the top images are filled in by the bottom ones. These missing areas might be due to “nodata” pixels in the original products (this happens sometimes, presumably because the satellite just missed those areas). More commonly, it’s because the Region of Interest extends across multiple products, so those products need to be combined to achieve a single image that covers the entire ROI. The point is that no masking occurs. The pipeline is not removing any data — it is simply ensuring that any missing areas in the original images are filled in by data available in other images.

Next, all the images in the collection are combined using the qualityMosaic method. This works very differently than the mosaicking we have been discussing up until now. The “quality mosaic” operates on a pixel-by-pixel basis. For each pixel location, it goes through the entire collection and selects the pixel with the highest Quality Score in that location. In other words, rather than prioritizing the least cloudy whole image, it prioritizes the least cloudy pixel. The resulting image may contain two adjacent pixels that come from wildly different products, if those pixels had the highest Quality Score. So instead of masking, which involves removing data, this “quality mosaic” scans the quality of all the pixels in all the images, and selects the best pixels regardless of how cloudy the image as a whole might be. For instance, an image might be 90% cloudy, but the remaining 10% may still be selected by the quality mosaic if they have a high quality score, thus allowing us to make use of even severely cloudy images.

At this point, we have two “cloud-free” images, one obtained by mosaicking all of the “best” images, the other obtained through the quality mosaic. The last step is to combine these two via a regular mosaic, prioritizing the one obtained from the “best” images. Of course, sometimes there won’t be any “best” images; in those cases, the quality mosaic will be returned in its entirety. Either way, the end result is a single cloud-free image. Below are some examples — specifically of the same areas shown above in the Cloud Masking section.

This Cloudfree Merging image performs visibly better than the Cloud Masking approach
While the “stitching” effect persists in this Cloudfree Merging image and the visible road is hazy, the overall image is contiguous

The overall advantages of Cloudfree Merging are best explained by reference to the other major ways to generate cloud-free images. We’ll discuss those next, and end with the pros and cons of Cloudfree Merging. We start with a simple but surprisingly effective method: merely taking the “least cloudy image” as-is.

Least Cloudy Image

The “least cloudy image” method involves simply selecting whichever image has the lowest “Cloudy Pixel Percentage,” value, which comes with every Sentinel-2 product. As Schmitt et al say, “often already the least cloudy image provides a clean and artefact-free solution” (150), and when we tried this for ourselves, we came to the same conclusion. Even in areas like the Congo Basin that are frequently cloudy, if your timeframe is long enough (in our case, two years), you will usually find at least one image that’s mostly free of clouds. Despite the simplicity of this method, it has a major benefit over all the more complicated ones: because you’re only using one image, you’ll never see any of the artefacts, “stitching,” or weirdness that occasionally results from combining together multiple images.

However, there are also two major drawbacks to this method. First, it very much relies on querying for images over a long enough timeframe, particularly for regions that are usually cloudy. If it’s not important to your project just when an image was taken, this isn’t a problem. But if you specifically want image data from a relatively small timeframe (i.e. a few months), you may not be able to find a single image that’s mostly devoid of clouds.

Second, recall how earlier we stated that the custom “Quality Pixel Percentage” value Schmitt et al create is different from the sen2cor and S2cloudless algorithms. Specifically, those classify thin hazy clouds that you can still see through as “clouds,” while the Schmitt et al algorithm does not. This means that selecting the image with the lowest “Cloudy Pixel Percentage” value will usually result in a relatively clear image that still has some opaque clouds, while using the Cloudfree Merging method often gives you a hazy image where the terrain features are still visible. If the goal is to capture as much terrain information as possible, then Cloudfree Merging is superior. (Of course, you could just use their custom “Quality Pixel Percentage” value — but at that point, you might as well just use the full merging procedure.)

To sum up, the least cloudy image method has some significant advantages:

  • There’s no need to combine multiple images, so you avoid any “artefacts” and other problems that are inherent to that
  • The result is usually a clear, high-quality, nice-looking image

But there are severe drawbacks as well:

  • It requires you to collect images over a broad timeframe, so it’s not useful if you’re looking for data from a specific date or range of dates
  • It relies on the same cloud mask algorithm for the metadata score as the conventional cloud masking method does, and so it overlooks hazy images with visible terrain since they’re classified as cloudy

For these reasons, we decided that while the Least Cloudy Image method is surprisingly good much of the time, in the end it was inferior to Cloudfree Merging.

Inferential approaches

Now, there is a major advantage to inferential approaches: as long as you have the model already, you only need to apply it to a single image to get a full, cloud-free one — whereas our method requires aggregating together multiple images over a span of time. So if your chief concern is getting an image at a specific date, there is essentially no other option than to use machine learning inference. But this isn’t the case for us — logging roads are not permanent but they do last more than a few months — so we decided against this method.

In other words, inferential approaches have this benefit:

  • It can be used on a single image product to generate a full, cloud-free image

But to achieve this, it requires a huge sacrifice:

  • Much of the resulting image will be “fake,” generated by an algorithm instead of an accurate snapshot taken by a camera

S2Cloudless Cloud Mask

In our testing, we found S2cloudless to be very effective at forming good quality cloud-free imagery, particularly when images were queried over a long timeframe. In fact, the quality was relatively good even when a short timeframe was queried. Compared with Cloudfree Merging, the s2cloudless mask tended to provide images that look better to the human eye with much less haze, but in exchange, it produced more artefacts that obscured important terrain features. See the images below for some examples, in False-Color rendering to emphasize the terrain features:

Comparison between (a) Sentinel Hub’s S2cloudless cloud masking method and (b) Cloudfree Merging for two separate ROIs during a 3 month period. False Color rendering is used to emphasize how each method captures terrain features (source: Project Canopy)

Overall, then, S2cloudless is perhaps the best method out there to

  • Provide cloud-free images that look good to the human eye

But it carries the downside of

  • Producing artefacts that sometimes obscure terrain features

Since we’re mostly concerned with detecting logging roads and other terrain features, we decided to go with Cloudfree Merging.

Other Methods?

Overall Picture

All of the methods we’ve listed lie somewhere on this “low quality/short range → high quality/long range” continuum. The “least cloudy image” method, for example, can often leave you with an extremely high-quality image, but you’ll have to use a big timeframe to guarantee finding a single image that’s relatively cloud-free. At the other end are the inferential approaches, which can operate on a single image but have a high chance of resulting in inaccurate image data.

Each cloud removal approach contains tradeoffs between the minimum necessary length of time you need to pull images from and the accuracy of the resulting cloudless image. Cloud Masking and Cloudfree Merge achieve a compromise, with other strengths and weaknesses

Within this framework, the Cloudfree Merging method attempts to achieve as small a timeframe as possible while still preserving a relatively high-quality image. In their article, Schmitt et al only pull images from a single season (3 months), and still often manage to end up with clear, quality images. While the cloudiness of the Congo Basin made such a small timeframe unfeasible for us (more on this in a future article), we were still often able to get good images in a shorter timeframe than other methods. Since we plan to do a time series analysis in the future, this is a major advantage for Cloudfree Merging and is one of the main reasons we ended up going with it, along with the prioritization of “hazy but visible” images that we discussed above.

All this being said, we don’t want to exaggerate the differences between approaches. Especially when there is a single (mostly) cloud-free image in your dataset, all these approaches will end up behaving very similarly. Additionally, the S2cloudless cloud mask may be more effective if your chief concern is making cloud-free images that look good to humans (who are likely to ignore small artefacts in the image). So while we ended up going with Cloudfree Merging, you should consider an alternate approach if your use case happens to differ from ours.

To summarize, then, Cloudfree Merging attempts to achieve a compromise between extending the timespan of images you collect and increasing the quality of the final cloud-free image. It attempts to let you decrease the timeframe as much as possible while still ending up with a good, full, cloudless image.

There is, finally, one last major benefit to Cloudfree Merging: while it can be done with any group of GIS tools out there, the source code was optimized for Google Earth Engine, which carries its own set of significant advantages, including cost, ease of use, and a large community forum.

Conclusion and Summary

  • Achieves a compromise between timeframe and image quality; creates a high-quality, largely cloudless image without necessarily needing a huge number of images to draw data from
  • Makes use of cloudy images (that over masking approaches would programmatically skip over) in its “quality mosaic”
  • Can be used with Google Earth Engine, gaining all the benefits of that platform

And here are its drawbacks:

  • Due to the way clouds are calculated, you often end up with a visible but “hazy” image, which can be ugly to look at
  • There are occasionally “stitching” artefacts of the merging process

Below, we summarize the benefits and drawbacks of each method:

In our next article, we will discuss how we obtained and applied the cloud merging code used by Schmitt et al, as well as some modifications we made to it.

Project Canopy

Environmental intelligence for the Congo Basin rainforest

Project Canopy

Project Canopy advances evidence-based policy making for the Congo Basin rainforest by providing environmental actors with the data, analytics and tools they need to end deforestation and defaunation.

David Nagy

Written by

Project Canopy

Project Canopy advances evidence-based policy making for the Congo Basin rainforest by providing environmental actors with the data, analytics and tools they need to end deforestation and defaunation.