Obscured by Clouds: A Comparative Analysis of Cloud Removal Methods

Published in

Project Canopy

17 min readFeb 4, 2021

This is the third post in a series documenting the design, development and implementation of a machine learning pipeline leveraging satellite imagery to predict drivers of deforestation in the Congo Basin rainforest. For more context, please read the first two articles here and here.

(co-authored by Zhenya Warshavsky)

Project Canopy is a new nonprofit aiming to bolster evidence-based policymaking by identifying where environmental actors can obtain the greatest returns on their conservation investments. One way we aim to do this is by developing a tool that allows users to track, in nearly real-time, logging activity in the Congo Basin. With this tool, users could quickly identify where primary forests are being cut down, and thus adopt remedial measures.

As data scientists at this organization, we decided to train a model to detect logging road locations; please see our first article for details. As explained in our second article, after trying out several alternatives, we decided to use Sentinel Hub as our main platform both for querying Sentinel-2 image products and downloading them. However, shortly before we actually implemented the data collection pipeline we had developed, we ran across some information that caused us to significantly modify our approach.

We encountered an academic article by Michael Schmitt, Lloyd Hughes, Chunping Qiu, and Xiao Xiang Zhu that detailed improvements to the way we had planned to obtain high-quality, (relatively) cloud-free Sentinel-2 images. Moreover, their method used Google Earth Engine, not Sentinel Hub. After spending two months learning and further improving upon this method, we decided it served our purposes better than any other cloud removal method we tried. It managed to provide high-quality cloud-free images without requiring a large pool of image products to draw from, or creating “fake data” using deep-learning models or the like. Our previous post corrects the misconceptions we had about Google Earth Engine; this post details both our old method and our new one, describes the reasons why we switched, and compares them to the other major cloud removal procedures out there.

Cloud Masking

One of the key hurdles to using visible-band satellite imagery is the fact that many images are obscured partially or almost entirely by clouds. This is particularly a problem for regions like the Congo Basin, our chief area of concern. According to our calculations, this region is particularly cloudy throughout most of the year: 62% on average. A large portion of the time the data science team has spent on the project so far has been dedicated to resolving this problem.

The average monthly cloud coverage over the Congo Basin rainforest in 2020

Until now, we were utilizing cloud masks. With cloud masking, an algorithm is applied to each satellite image in order to detect the probability of each pixel being a cloud. You can then set a threshold value — e.g., any pixel with at least a 50% chance of being a cloud counts as a “cloudy pixel.” The result is a matrix of 1s and 0s (0 for cloud pixels, 1 for non-cloud pixels) of the same shape as the original image. This is the “cloud mask.” By applying this cloud mask, multiplying each pixel value in the image by its equivalent in the mask, we can convert (likely) cloudy pixels to “no data” areas. This may not seem like an improvement, but we then mosaic together multiple images. Typically, mosaicking is used to combine several small images into a single big one. But what happens when you combine several images that cover the same area into a single, larger image?

You can think of this process as “stacking” multiple images on top of each other. Imagine that each image is a piece of paper. You have ten images, selected however you want (e.g., across a certain time range, and/or sorted by image quality) and you place them all in one stack, one on top of the other. But each paper has holes in it, corresponding to the image’s “no data” areas. So any holes in the top paper will be “filled in” by the paper directly underneath it.

*The hole in the top image is “filled in” by the image below it*

Because the top two images may still have overlapping “no data” areas, these gaps may be filled in by the third piece of paper, and so on. By masking out the cloudy areas, we enable those areas to progressively be “filled in” by other images. If we then compress this stack into a single piece of paper, we should be left with a single “mosaicked” image. Masking the clouds out of a bunch of images and combining them all into a single mosaic ideally results in a single, cloud-free image.

Types of Masks

There are three major cloud masking algorithms for Sentinel-2 images, and Google Earth Engine has recently introduced a fourth one. First is sen2cor, the algorithm used by the European Space Agency (ESA), the organization that operates the Sentinel-2 satellites. They apply this algorithm to all the image products they collect and then provide the resulting masks with those products. We used these at first, simply because the sen2cor masks are easy to access and comprehensive.

The other two major ones are MAJA, used by the French Space Agency (CNES), and FMask, used by the US Geological Survey (USGS). While Baetens et al argue that these are superior to sen2cor, when we investigated USGS, we found their product offerings to be far less comprehensive than ESA’s, missing large time windows and geographical regions. In addition, GIS-Blog asserts that FMask is designed more for Landsat than for Sentinel. Those weaknesses made it unsuitable for our purposes, but if they do not apply to your use cases, these alternatives are perhaps worth investigating.

Finally, Google Earth Engine has recently released cloud masks derived from Sentinel Hub’s S2cloudless algorithm. We tested this ourselves, and found it to be generally superior to the sen2cor results. So if you want to use cloud masking, and are working with Google Earth Engine, we definitely recommend using the S2cloudless masks.

However, S2cloudless suffers from the same fundamental issues as all cloud masking approaches, which we will cover next.

Drawbacks

While cloud masking is a theoretically solid idea and works up to an extent, there were several drawbacks when we actually implemented it. (The below two images both result from applying the sen2cor mask.)

First, and most importantly, we often ran into a “pixelation” effect. Because we were “filling in” the holes of one picture with another, when those pictures looked different (as a result of different timestamps or other factors), this made the final image lack resolution or have randomly scrambled pixels.

“Pixelation” effect and partial cloud removal using the sen2cor cloud mask

Second, and more challenging to address, was a “stitching” effect that made the images look patched together. This, along with some other “artefacts” of the mask/mosaic process, such as the pixelation described above, meant that we often ended up with a clearly non-contiguous image.

“Stitching” visible as the top part seems to clearly come from a different image. “Pixelation” effect can be seen in the bottom right corner (using the sen2cor cloud mask)

Third, no cloud detection algorithm is perfect. Sometimes clouds that the algorithm didn’t detect remain in images, and sometimes the algorithm mistakes valuable parts of the image as clouds, causing us to lose them. This is obviously unavoidable, and the algorithms are continually improving: the S2cloudless algorithm is superior to sen2cor in this respect, for example. But since cloud masking inherently involves removing data, you are always at risk of losing something valuable. This is particularly important for us because we’re interested in retaining tiny parts of the images, namely the logging roads themselves.

To sum up, then, here are the upsides of cloud masking:

The resulting images almost never have clouds in them
The cloud masks are easy to access and relatively straightforward to use

On the other hand, it also has these drawbacks:

Often results in “artefacts,” such as a “pixelation” or “stitching” effect, that makes the resulting image look bad
The masking algorithm often makes errors, leaving in clouds or masking out areas that contain useful data

While these drawbacks were not significant enough to derail the project, they did mean we were on the lookout for a better way — and we found one in the article by Schmitt et al.

A Better Way: Cloudfree Merging

Schmitt et al claim that their method is both able to solve the problems detailed above, and can provide better cloud-free images than some major competing approaches. We will summarize their approach here, but if you’re interested, please do read the article itself (available for free) or look at their source code (in JavaScript). They do not name their method, so in this article, we’ll call it “Cloudfree Merging” or “merging.” Their Cloudfree Merging involves the following steps:

Made by Project Canopy, based on the flowchart from Schmitt et al

Querying — Obtaining images in a specific area from a specified timeframe
Quality Score — Compute a “quality score” for each pixel based on its likelihood of being a cloud or cloud shadow
Merging — Use the quality score to merge together all the images into a single, cloud-free one

Querying

For an explanation of how to negotiate the apparent limitations on querying and downloading images in Google Earth Engine, see our previous post.

Quality Score

After getting the products, the next step is calculating a “quality score” for each pixel. Essentially, this quality score ranks the likelihood that the pixel contains useful information, as opposed to being either a cloud or the shadow of a cloud. In this regard, it’s a cloud mask like the ones we discussed earlier. But there are at least two key differences between this and the sen2cor cloud mask provided by the ESA.

First, the sen2cor and S2cloudless masks do not differentiate between clouds that humans can see through and clouds that they can’t. As Schmitt et al say, even images with extensive light cirrus clouds “are still largely visible with only mild haziness in some areas.” While this haze makes the image less visually appealing, it still preserves the important terrain information. Since we are interested in accurately capturing terrain features to feed into a machine learning model, even a completely hazy (yet still visible) image is preferable to a sharp image with some regions blotted by opaque clouds.

Second, the formula used by Schmitt et al recognizes that clouds are contiguous. Clouds are almost never just a small group of pixels that are cloudy (with the surrounding pixels clear) or vice versa. To remove this noise, it applies morphological opening and closing.

After calculating this quality score, Schmitt et al then use it to create a single “Quality Pixel Percentage” number for each entire image in the collection. They select a threshold number for their quality score: below that threshold, the pixel counts as a “bad pixel” (i.e., a cloud or a cloud shadow), while it’s a “good pixel” if it’s above the threshold. The percentage of pixels that are “good” constitute the entire image’s Quality Pixel Percentage. This single value is used in the merging process.

“The cloud-free image generation process. (a) The original image from the collection with the least cloud cover for Jacksonville, Florida in winter, (b) computed cloud score for the image, with a color scale from blue (low cloud probability) to red (high cloud probability), (c) cloud and shadow mask computed by thresholding the Quality Score with green representing the cloud contribution and blue the shadow, and (d) the final cloud-free image produced for the scene.” (image and caption from Schmitt et al)

Merging

At this point, we have an image collection, a Quality Pixel Percentage value for each image in that collection, and a Quality Score for each pixel in all the images. Both of these values are used to combine the entire collection into a single cloud-free image.

First, we select all the images with a Quality Pixel Percentage of 95% or more. Any such images are considered “good enough” to use independently, and it’s preferable to keep them whole rather than introduce noise and image artefacts by combining multiple images together, even if doing so would achieve marginal improvement of a few percentage points. This “best” collection is then sorted, so that the least cloudy images are on top, and then mosaicked together so that any missing areas in the top images are filled in by the bottom ones. These missing areas might be due to “nodata” pixels in the original products (this happens sometimes, presumably because the satellite just missed those areas). More commonly, it’s because the Region of Interest extends across multiple products, so those products need to be combined to achieve a single image that covers the entire ROI. The point is that no masking occurs. The pipeline is not removing any data — it is simply ensuring that any missing areas in the original images are filled in by data available in other images.

Next, all the images in the collection are combined using the qualityMosaic method. This works very differently than the mosaicking we have been discussing up until now. The “quality mosaic” operates on a pixel-by-pixel basis. For each pixel location, it goes through the entire collection and selects the pixel with the highest Quality Score in that location. In other words, rather than prioritizing the least cloudy whole image, it prioritizes the least cloudy pixel. The resulting image may contain two adjacent pixels that come from wildly different products, if those pixels had the highest Quality Score. So instead of masking, which involves removing data, this “quality mosaic” scans the quality of all the pixels in all the images, and selects the best pixels regardless of how cloudy the image as a whole might be. For instance, an image might be 90% cloudy, but the remaining 10% may still be selected by the quality mosaic if they have a high quality score, thus allowing us to make use of even severely cloudy images.

At this point, we have two “cloud-free” images, one obtained by mosaicking all of the “best” images, the other obtained through the quality mosaic. The last step is to combine these two via a regular mosaic, prioritizing the one obtained from the “best” images. Of course, sometimes there won’t be any “best” images; in those cases, the quality mosaic will be returned in its entirety. Either way, the end result is a single cloud-free image. Below are some examples — specifically of the same areas shown above in the Cloud Masking section.

This Cloudfree Merging image performs visibly better than the Cloud Masking approach

While the “stitching” effect persists in this Cloudfree Merging image and the visible road is hazy, the overall image is contiguous

The overall advantages of Cloudfree Merging are best explained by reference to the other major ways to generate cloud-free images. We’ll discuss those next, and end with the pros and cons of Cloudfree Merging. We start with a simple but surprisingly effective method: merely taking the “least cloudy image” as-is.

Least Cloudy Image

In their article, Schmitt et al compare their method to three other common ones: (a) calculating the median value of each pixel; (b) selecting the greenest pixel (i.e. the one with the highest NDVI value); (c) merely selecting the least cloudy image with no further processing, and concluded that their method was superior to all of them. We ourselves did not independently test methods (a) and (b), since they did not seem particularly fruitful. Method (c), on the other hand, was more interesting, so we will discuss it here.

The “least cloudy image” method involves simply selecting whichever image has the lowest “Cloudy Pixel Percentage,” value, which comes with every Sentinel-2 product. As Schmitt et al say, “often already the least cloudy image provides a clean and artefact-free solution” (150), and when we tried this for ourselves, we came to the same conclusion. Even in areas like the Congo Basin that are frequently cloudy, if your timeframe is long enough (in our case, two years), you will usually find at least one image that’s mostly free of clouds. Despite the simplicity of this method, it has a major benefit over all the more complicated ones: because you’re only using one image, you’ll never see any of the artefacts, “stitching,” or weirdness that occasionally results from combining together multiple images.

However, there are also two major drawbacks to this method. First, it very much relies on querying for images over a long enough timeframe, particularly for regions that are usually cloudy. If it’s not important to your project just when an image was taken, this isn’t a problem. But if you specifically want image data from a relatively small timeframe (i.e. a few months), you may not be able to find a single image that’s mostly devoid of clouds.

Second, recall how earlier we stated that the custom “Quality Pixel Percentage” value Schmitt et al create is different from the sen2cor and S2cloudless algorithms. Specifically, those classify thin hazy clouds that you can still see through as “clouds,” while the Schmitt et al algorithm does not. This means that selecting the image with the lowest “Cloudy Pixel Percentage” value will usually result in a relatively clear image that still has some opaque clouds, while using the Cloudfree Merging method often gives you a hazy image where the terrain features are still visible. If the goal is to capture as much terrain information as possible, then Cloudfree Merging is superior. (Of course, you could just use their custom “Quality Pixel Percentage” value — but at that point, you might as well just use the full merging procedure.)

To sum up, the least cloudy image method has some significant advantages:

There’s no need to combine multiple images, so you avoid any “artefacts” and other problems that are inherent to that
The result is usually a clear, high-quality, nice-looking image

But there are severe drawbacks as well:

It requires you to collect images over a broad timeframe, so it’s not useful if you’re looking for data from a specific date or range of dates
It relies on the same cloud mask algorithm for the metadata score as the conventional cloud masking method does, and so it overlooks hazy images with visible terrain since they’re classified as cloudy

For these reasons, we decided that while the Least Cloudy Image method is surprisingly good much of the time, in the end it was inferior to Cloudfree Merging.

Inferential approaches

So far, we’ve only discussed mosaicking after masking out clouds — but it’s also possible to use machine learning to infer what data “should be” there instead of clouds. The obvious drawback to this approach is that you’re effectively creating completely new data, and relying on the model’s accuracy to hope it lines up with what’s actually out there in the world. This would be a particular problem with our project, as logging roads are small and varied enough that it’s unlikely a model could infer them accurately, at least at the present moment.

Now, there is a major advantage to inferential approaches: as long as you have the model already, you only need to apply it to a single image to get a full, cloud-free one — whereas our method requires aggregating together multiple images over a span of time. So if your chief concern is getting an image at a specific date, there is essentially no other option than to use machine learning inference. But this isn’t the case for us — logging roads are not permanent but they do last more than a few months — so we decided against this method.

In other words, inferential approaches have this benefit:

It can be used on a single image product to generate a full, cloud-free image

But to achieve this, it requires a huge sacrifice:

Much of the resulting image will be “fake,” generated by an algorithm instead of an accurate snapshot taken by a camera

S2Cloudless Cloud Mask

Earlier, we discussed our first attempt to remove clouds with the sen2cor cloud mask. However, there are other cloud masks out there, and in our experience the best one is the S2cloudless algorithm. We don’t have space to explain the details here, so see this article for more information on the algorithm, and this page for a tutorial on how to use it in Google Earth Engine.

In our testing, we found S2cloudless to be very effective at forming good quality cloud-free imagery, particularly when images were queried over a long timeframe. In fact, the quality was relatively good even when a short timeframe was queried. Compared with Cloudfree Merging, the s2cloudless mask tended to provide images that look better to the human eye with much less haze, but in exchange, it produced more artefacts that obscured important terrain features. See the images below for some examples, in False-Color rendering to emphasize the terrain features:

Comparison between (a) Sentinel Hub’s S2cloudless cloud masking method and (b) Cloudfree Merging for two separate ROIs during a 3 month period. False Color rendering is used to emphasize how each method captures terrain features (source: Project Canopy)

Overall, then, S2cloudless is perhaps the best method out there to

Provide cloud-free images that look good to the human eye

But it carries the downside of

Producing artefacts that sometimes obscure terrain features

Since we’re mostly concerned with detecting logging roads and other terrain features, we decided to go with Cloudfree Merging.

Other Methods?

There may very well be other methods we haven’t encountered yet in our research, and we’d be more than happy to hear suggestions or feedback about this. Leave us a comment!

Overall Picture

One way to compare all these methods is to think of them as involving tradeoffs between image quality and timeframe. The more you extend the timeframe for your image search, the more images you’ll collect, which will give you more data and ultimately make your final cloud-free image of higher quality. However, in doing so, you will lose the ability to guarantee that your image will contain information from a specific date or a smaller date range. The higher you want your image quality to be, the more you’ll have to resort to extending the date range.

All of the methods we’ve listed lie somewhere on this “low quality/short range → high quality/long range” continuum. The “least cloudy image” method, for example, can often leave you with an extremely high-quality image, but you’ll have to use a big timeframe to guarantee finding a single image that’s relatively cloud-free. At the other end are the inferential approaches, which can operate on a single image but have a high chance of resulting in inaccurate image data.

Each cloud removal approach contains tradeoffs between the minimum necessary length of time you need to pull images from and the accuracy of the resulting cloudless image. Cloud Masking and Cloudfree Merge achieve a compromise, with other strengths and weaknesses

Within this framework, the Cloudfree Merging method attempts to achieve as small a timeframe as possible while still preserving a relatively high-quality image. In their article, Schmitt et al only pull images from a single season (3 months), and still often manage to end up with clear, quality images. While the cloudiness of the Congo Basin made such a small timeframe unfeasible for us (more on this in a future article), we were still often able to get good images in a shorter timeframe than other methods. Since we plan to do a time series analysis in the future, this is a major advantage for Cloudfree Merging and is one of the main reasons we ended up going with it, along with the prioritization of “hazy but visible” images that we discussed above.

All this being said, we don’t want to exaggerate the differences between approaches. Especially when there is a single (mostly) cloud-free image in your dataset, all these approaches will end up behaving very similarly. Additionally, the S2cloudless cloud mask may be more effective if your chief concern is making cloud-free images that look good to humans (who are likely to ignore small artefacts in the image). So while we ended up going with Cloudfree Merging, you should consider an alternate approach if your use case happens to differ from ours.

To summarize, then, Cloudfree Merging attempts to achieve a compromise between extending the timespan of images you collect and increasing the quality of the final cloud-free image. It attempts to let you decrease the timeframe as much as possible while still ending up with a good, full, cloudless image.

There is, finally, one last major benefit to Cloudfree Merging: while it can be done with any group of GIS tools out there, the source code was optimized for Google Earth Engine, which carries its own set of significant advantages, including cost, ease of use, and a large community forum.

Conclusion and Summary

To summarize, then, here are the advantages of Cloudfree Merging:

Achieves a compromise between timeframe and image quality; creates a high-quality, largely cloudless image without necessarily needing a huge number of images to draw data from
Makes use of cloudy images (that over masking approaches would programmatically skip over) in its “quality mosaic”
Can be used with Google Earth Engine, gaining all the benefits of that platform

And here are its drawbacks:

Due to the way clouds are calculated, you often end up with a visible but “hazy” image, which can be ugly to look at
There are occasionally “stitching” artefacts of the merging process

Below, we summarize the benefits and drawbacks of each method:

In our next article, we will discuss how we obtained and applied the cloud merging code used by Schmitt et al, as well as some modifications we made to it.