Land Cover Classification with eo-learn: Part 3

Pushing Beyond the Point of “Good Enough”

Matic Lubej
Sentinel Hub Blog
10 min readFeb 14, 2019


Transition of an area from the winter to the summer season, composed with Sentinel-2 images. Hints of the discriminative power of the snow cover can be noticed, as confirmed in the previous blog post.


These past few weeks must have been quite hard on you. We published the first and the second part of the land cover classification on the country scale using eo-learn. eo-learn is the open-source package for bridging the gap between Earth Observation (EO) and Machine Learning (ML), but in the provided Jupyter notebook we only provided example data and showed the results for a small percentage of the whole area of interest (AOI) — big whoop… no big deal, right? I know that seems mediocre at best, and above all, quite rude on our behalf. And all this time you were having trouble getting a good night’s sleep due to wondering how to use all this knowledge and take it to the next level.

Don’t worry… The third part of this blog series will provide you with the means to do just that! So go grab a cup of coffee, sit down, and get ready…

All our Data are Belong to You!

Are you sitting down yet? Maybe leave the hot coffee on your desk for just a bit longer and listen to the best news that you will hear all day…

Here at Sinergise, we have decided to share the dataset for the whole region of Slovenia for the year 2017. With all of you. For free. You can now get your hands on 200 GB of data in the form of about 300 EOPatches, each roughly the size of 1000 x 1000 pixels at 10 m resolution! You can read more about our EOPatch data format in one of our previous blog posts about eo-learn, but essentially it’s a data container for spatio-temporal EO and non-EO data and their derivatives.

We haven’t been cheap with our data, either. Each EOPatch contains Sentinel-2 L1C images, the corresponding s2cloudless cloud masks, and the official land use data in the form of a raster map!

The data is stored on the AWS S3 Cloud Object Storage and can be downloaded via this link:

Link to AWS S3 Bucket:

Each EOPatch is a container of EO and non-EO data. You can load an EOPatch in eo-learn with the following command:

You will obtain the EOPatch of the following structure:

It is possible then to access various EOPatch content via calls like:

EOExecute Order 66

Great, the data is being downloaded. While we wait for the data to download, let’s take a look at a nifty functionality of eo-learn that hasn’t been showcased yet — the EOExecutor class. This module handles the execution and monitoring of a workflow and allows for the use of multiprocessing in a very intuitive and carefree way. No more searching on Stack Overflow on how to parallelise your workflow properly or how to make the progress bar work with multiprocessing, EOExecutor takes care of both!

Additionally, it handles any occurring errors and it can generate a summary of the execution process. The latter is crucial for making sure that your results are reproducible in the future, so you don’t lose precious company time tracing back your steps in order to find out which parameters you used to produce the results last Thursday at 9:42 AM after a whole night of drinking with friends (don’t drink and code!). It even produces a cool looking dependency graph of the workflow, which you can show to your boss!

Dependency graph of the tasks in the workflow, provided by eo-learn.

Experimenting with the ML Pipeline

As promised, this blog post is meant to show you how to start exploring different pipelines with eo-learn using the data we provided. Below we prepared two experiments, where we study the effects of clouds and the effects of different choices of resampling after the temporal interpolation on the final result. Lastly, we also started working with Convolutional Neural Networks (CNNs) and wanted to compare the results of the two different approaches — the pixel-based decision trees and the convolutional deep learning algorithms — to perform land cover classification.

Unfortunately, there is no simple “yes” or “no” answer that would generalise well for all cases when deciding on which experiments to perform. You can study the problem and make some assumptions in order to decide if the effort is worth it, but in the end, improving the pipeline always comes down to the most fundamental method of problem-solving. Trial and error.

Playing with Clouds

Clouds are a nuisance in the world of EO, especially when working with machine learning algorithms, where you want to detect the clouds and remove them from your dataset in order to perform a temporal interpolation over the missing data. But how big of an improvement does this actually bring in? Is the procedure really worth it? Rußwurm and Körner in their paper Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders even show that for deep learning the tedious procedure of cloud filtering might be completely unnecessary, since the classifier itself learns how to predict clouds.

Activation of the input (top) and the modulation (bottom) gate over the sequence of observations for a particular cell in the neural network. This cell has learned cloud masking and filtering, as input and modulation gates clearly show different activations on cloudy and non-cloudy observations. (Page 9 in

As a reminder to this specific part of data preparation (explained in detail in the previous blog post), let’s revise the cloud filtering procedure. After obtaining the Sentinel-2 image data we first perform the cloudy scene filtering. In this step, only the time frames with a ratio of non-cloudy pixels larger than 80 % are kept (thresholds might vary for different areas of interest). Secondly, temporal interpolation is performed to evaluate pixel values for arbitrary dates in the given interval. In this step, the cloud masks are taken into account, so that the values of cloudy pixels do not affect the interpolation.

Four possible variations of the pipeline present themselves:

  • A1) with scene filtering, cloud mask taken into account,
  • A2) without scene filtering, cloud mask taken into accounts,
  • A3) with scene filtering, cloud mask not taken into account,
  • A4) without scene filtering, cloud mask not taken into account.
A visual representation of a temporal stack of Sentinel-2 images over a randomly selected area. The transparent pixels on the left imply missing data due to cloud coverage. The stack in the centre represents the pixel values after cloudy scene filtering and temporal interpolation with cloud masking (case A4), while the stack on the right shows the case without cloudy scene filtering and no cloud masking performed during interpolation (case A1).

We already executed the pipeline variation A1 and showed the results so we can compare it to the other pipeline variations. Preparing the different pipelines and training the model is pretty much straightforward at this point. You only need to make sure that you’re not comparing apples to oranges, meaning that in all four variations of the pipeline, you have to train and validate on the same collection of pixels from the same collection of EOPatches, while taking the same train/test splits into account!

The results are shown in the table below. You can see that in this particular application the clouds do not seem to affect the overall performance much! This might be due to the fact that the reference map quality is very high and the model is able to correctly determine the land cover label based on just a few observations. However, this might just be the case for this particular AOI and the results probably do not generalise well for all cases, so don’t discard your cloud detection step from your workflow based on just these results!

Results of overall accuracy and weighted F1 scores for different workflows with regards to cloud effects.

Effects of Different Choice of Temporal Resampling

The choice of temporal resampling after the interpolation is not obvious. On one hand, we want a relatively fine grid of sampled dates in order to not lose valuable data, but at some point, all available information is taken into account, so including more sampling dates does not improve the result further. On the other hand, we are constrained by the computing resources. Decreasing the interval step by a factor of 2 doubles the amount of time frames after the interpolation, and therefore increases the number of features that are used in the classifier learning. Is the improvement of the result in this case large enough to justify the increased use of computing resources? Check the results below!

For this experiment, we always use the pipeline variation A1 as the starting point. After the interpolation, we resample with the following variations:

  • B1) uniform resampling with an 16-day interval step,
  • B2) uniform resampling with an 8-day interval step,
  • B3) optimal “cherry-picked” dates, same amount of dates as in B2,

where the selection in B3 is based on the most common dates for all EOPatches in the selected area of interest.

This plot shows the number of EOPatches, which contain image data for each day of the year 2017 (blue). The overlaid lines (red) represent the optimal dates for the resampling choice, which were based on the Sentinel-2 acquisitions for the given AOI in 2017.

Looking at the table below, one can see that the results are similarly anticlimactic as in the case of the cloud effects experiment. Both, in B2 and B3 case, the amount of required computing resources approximately doubles, due to the increased number of training features, while the increase in overall accuracy (OA) and the weighted F1 score is only less than one per cent. Such improvements are too small to be visible in a proper application, so the 16-day rate resampling choice seems to be a good fit for the given scenario.

Results of overall accuracy and weighted F1 scores for different workflows with regards to different resampling choices.

Deep Learning: Using a Convolutional Neural Network (CNN)

Deep learning methods have become state-of-the-art in many tasks in fields such as computer vision, language, and signal processing. This is due to their ability to extract patterns from complex high-dimensional input data. Classical ML methods (such as decision trees) have been used in many EO applications to analyse temporal series of satellite images. On the other hand, CNNs have been employed to analyse the spatial correlations between neighbouring observations, but mainly in single temporal scene applications. We wanted to investigate a deep learning architecture, which is capable of analysing the spatial as well as the temporal aspect of satellite imagery, simultaneously.

In order to do this, we used a Temporal Fully-Convolutional Network (TFCN), A.K.A. a temporal extension of a U-Net, implemented in TensorFlow. In more detail, the architecture exploits the spatio-temporal correlations to maximise the classification score, with the additional benefit of representing spatial relationships at different scales due to the encoding-decoding U-Net structure. Similarly, as in the case of Classical ML models, the output of the network results in a 2D label map, which is compared to the ground-truth labels.

Architecture of the TFCN deep learning model.

The trained model was used to predict the labels on the test sample and the obtained results were then validated against the ground-truth. Overall accuracy of 84.4% and a weighted F1 score of 85.4% were achieved.

Comparison of different predictions of land cover classification. True colour image (top left), ground-truth land cover reference map (top right), prediction with the LightGBM model (bottom left), and prediction with the U-Net model (bottom right).

These results represent preliminary work on a prototype architecture, which was not optimised for the task at hand. Despite this, results are in line with some of the reported works in the field. Optimisation of the architecture (e.g.
number of features, depth of the network, number of convolutions) and of the hyper-parameters (e.g. learning rate, number of epochs, class weighting) is required to fully assess the potential of TFCNs. We are looking forward to continue to do some deep exploring (pun intended), and we even plan to share our code once it’s in a presentable format.

Other Experiments

There are many more experiments that could still be done, but we can’t think of all of them, neither can we perform them. That’s where you come in! Show us what you can do with this dataset and help us improve the results!

For example, one of our outside colleagues is starting an Earth on AWS internship with us, where they will work on a project with land cover classification based on a temporal stack of single image predictions with CNNs. The idea is that, for certain land cover classes, such as artificial surface, water, or certain types of forest, the spatial context might be sufficient to identify them without needing to take into account the temporal information. We are excited to see where this idea takes us and a dedicated blog post is also planned!

You are also very welcome to join our team and help us think of new ideas and bring them to life. So do not hesitate to contact us at, we are hiring!

The End!

Hopefully, you have enjoyed reading and learning about land cover classification with eo-learn in this blog post trilogy. We feel that we paved the way well enough for you to start exploring big data in EO on your own and can’t wait to see what comes out of it.

We really believe in the open-source community and feel that it’s crucial for pushing the boundaries of the knowledge frontier. Thanks so much for participating and contributing!

Link to Part 1:

Link to Part 2:

eo-learnis a by-product of the Perceptive Sentinel European project. The project has received funding from European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement 776115.



Matic Lubej
Sentinel Hub Blog

Data Scientist from Slovenia with a Background in Particle Physics.