Crowd-sourcing the open dataset for water detection

Label annotation for satellite images made easy with ClassificationApp

Devis Peressutti
Jun 20 · 6 min read

If you ever worked on real-life machine learning applications, you surely experienced the frustration in realising that the training labels you are working with are too few, or of poor quality. Things get even worse if your application is based on Earth Observation data, where the availability of open-source labelled data is still very limited, while terabytes of fresh spatial data are acquired daily. Well, we know exactly how you feel!

Luckily, as data scientists and software developers, we have experience in channelling these frustrations through our fingertips (using black magic) and transforming them into lines of beautiful code. This time around, the result of such coding is a restyled and improved version of ClassificationApp, a web-application tool to help you annotate satellite images in an easy and intuitive way.

ClassificationApp improvements

The major feature improvement is the ability to create your own labelling campaigns and make them available to other users for crowd-sourcing. In fact, you can create public and private campaigns, where public campaigns are visible to any user of the application, while private campaigns are only visible to creators and users with a link to it. All you need is a free Geopedia account, which you can create here.

By making campaigns public, we hope people will contribute to the tedious yet crucial exercise of creating accurately labelled data to be used, for instance, to train supervised machine learning algorithms. Contributors will be able to use the resulting labelled data-sets for their own use, as well as experience the unmatched joys of contributing to open-source projects.

User Interface to create a new campaign. Campaigns can be customised by data-source, labelling classes and area of interest, as well as the size of the labelled image.

When creating new campaigns, users can specify the area of interest over which labels will be collected, the imaging data-source with corresponding parameters (such as cloud coverage and time range), the number and name of labelling classes, and the size of the labelled image. Along with these details, campaign owners will write instructions on the labelling task to facilitate contribution from experts and non-experts. Once a campaign is created, the fun begins.

Example UI for a labelling campaign. The left pane helps to understand the spatial context and allows visualisation of supporting data-sources, like high-resolution imagery, maps, and geometries. On the right pane, the canvas allows you to classify each pixel according to the classes you specified when creating the campaign. Paint and delete tools are available to speed up the task.

At the moment, only pixel-wise classification is supported, but we plan to support image-wise and multi label classification as well. Once inside the app, the left side of the screen shows the surrounding area of the image patch to label (i.e. the blue box) to help you understand the spatial context. Furthermore, different visualisation layers can be displayed in order to aid discerning land cover classes or clouds. The location and time of acquisition of the image are displayed and a map from OpenStreetMap is available as an additional visualisation source. Other visualisation sources can easily be added, such as high-resolution imagery, vector geometries, or, if you are bold enough, radar imagery. On the right pane, the canvas lets you classify each pixel into one of the classes you specified at the time of campaign creation. You can use brushes and erasers of different sizes to annotate/delete pixels, and a bucket tool to fill the entire patch with a single click. Once you are happy, you can save the annotated image and get another patch to annotate, or just skip the saving and get another task.

Results of the tasks are saved to Geopedia. From here they can be retrieved as EOPatches using eo-learn.

ClassificationApp uses Geopedia as a back-end database, where all information about campaigns is stored. Using Geopedia as a geospatial database comes with many advantages, some of which are the support for raster and vector formats, the possibility to easily query tables and results through the web-based API, and the possibility to retrieve results using eo-learn. Another advantage is that an existing labelled data-set could be uploaded to Geopedia and used as a starting point for a review campaign, as is the case for the water level annotation campaign described below.

Crowd-sourcing water surface detection in Sentinel-2 image

Example of the water surface level trend for a dam in South Africa. You can explore water bodies from all over the world here.

The water segmentation algorithm that powers BlueDot uses a water index derived from Sentinel-2 images and applies dynamic thresholding to determine the optimal threshold value that best separates water and soil. Despite being simple, the algorithm generates satisfactory results in the majority of cases. However, artefacts introduced by the cloud masking algorithm or cloud shadows negatively affect the segmentation result, as shown below.

Example of segmentation artefacts introduced, in this case, by cloud shadows.

One possible way to improve robustness of the water segmentation algorithm is to use a machine learning model trained on labelled data of water bodies with global coverage under different artefact conditions. In order to collect the labelled data suitable to train such a model, we have created the dedicated public campaign Water-Body Segmentation Correction on the classification application. By uploading segmented water bodies to Geopedia, ClassificationApp randomly selects patches from a randomly chosen water body and allows the user to correct for errors present in the current segmentation. By relying on Geopedia, campaigns reviewing and correcting pre-existing classifications can be created. The ability to review and improve existing classification maps is very valuable when training machine learning models iteratively.

Example of task suggested by the water body segmentation campaign.

When selecting the water body campaign from the campaign selection page after logging in, a task is presented to the user. On the left side, Sentinel-2 images and the contour of the water body generated by the current algorithm are overlaid. Different band indices can be displayed to aid discerning the actual water surface level. On the right side, the existing water mask is overlaid onto the Sentinel-2 image, allowing the user to modify it and correct for possible mistakes. After editing, the resulting labelled image is saved to Geopedia. The resulting data-set can be downloaded as EOPatches which can be directly fed to a machine learning algorithm using eo-learn. The labelled data-set, generated by a public campaign, will be available to all contributors of the campaign.

This is, therefore, an official call to labelling arms, where we invite anyone to contribute to the water body segmentation correction campaign. Every single click is important. The resulting open-source data-set could be used to build a global water detection algorithm using machine learning. Such a model could be used for many applications, ranging from water level management and prediction to managing the flood response all over the world.


Since we are a very generous and open-minded bunch, we have open-sourced the Python code that builds the back-end, as well as the JavaScript REACT code that builds the front-end. With little effort, you should be able to create your own web-based labelling application. Geopedia tables can be replaced with an SQL or equivalent database with the same data schema.


If you have any questions related to the water body campaign or ClassificationApp, please get in touch with us at eoresearch@sinergise.com. We will be continuously improving the ClassificationApp front-end & back-end and adding new functionalities. Please contact us also in case you require assistance with setting up a new campaign or need functionality which is not yet supported, we will be happy to oblige!

And if you would like to help us develop the ClassificationApp and its functionalities even further, contact us at work@sinergise.com. We are hiring!

Sentinel Hub Blog

Stories from the next generation satellite imagery platform

Thanks to Matic Lubej and Grega Milcinski

Devis Peressutti

Written by

Data scientist passionate about earth and medical images.

Sentinel Hub Blog

Stories from the next generation satellite imagery platform

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade