Crowd-sourcing the open dataset for water detection
Label annotation for satellite images made easy with ClassificationApp
If you ever worked on real-life machine learning applications, you surely experienced the frustration in realising that the training labels you are working with are too few, or of poor quality. Things get even worse if your application is based on Earth Observation data, where the availability of open-source labelled data is still very limited, while terabytes of fresh spatial data are acquired daily. Well, we know exactly how you feel!
Luckily, as data scientists and software developers, we have experience in channelling these frustrations through our fingertips (using black magic) and transforming them into lines of beautiful code. This time around, the result of such coding is a restyled and improved version of ClassificationApp, a web-application tool to help you annotate satellite images in an easy and intuitive way.
The major feature improvement is the ability to create your own labelling campaigns and make them available to other users for crowd-sourcing. In fact, you can create public and private campaigns, where public campaigns are visible to any user of the application, while private campaigns are only visible to creators and users with a link to it. All you need is a free Geopedia account, which you can create here.
By making campaigns public, we hope people will contribute to the tedious yet crucial exercise of creating accurately labelled data to be used, for instance, to train supervised machine learning algorithms. Contributors will be able to use the resulting labelled data-sets for their own use, as well as experience the unmatched joys of contributing to open-source projects.
When creating new campaigns, users can specify the area of interest over which labels will be collected, the imaging data-source with corresponding parameters (such as cloud coverage and time range), the number and name of labelling classes, and the size of the labelled image. Along with these details, campaign owners will write instructions on the labelling task to facilitate contribution from experts and non-experts. Once a campaign is created, the fun begins.
At the moment, only pixel-wise classification is supported, but we plan to support image-wise and multi label classification as well. Once inside the app, the left side of the screen shows the surrounding area of the image patch to label (i.e. the blue box) to help you understand the spatial context. Furthermore, different visualisation layers can be displayed in order to aid discerning land cover classes or clouds. The location and time of acquisition of the image are displayed and a map from OpenStreetMap is available as an additional visualisation source. Other visualisation sources can easily be added, such as high-resolution imagery, vector geometries, or, if you are bold enough, radar imagery. On the right pane, the canvas lets you classify each pixel into one of the classes you specified at the time of campaign creation. You can use brushes and erasers of different sizes to annotate/delete pixels, and a bucket tool to fill the entire patch with a single click. Once you are happy, you can save the annotated image and get another patch to annotate, or just skip the saving and get another task.
ClassificationApp uses Geopedia as a back-end database, where all information about campaigns is stored. Using Geopedia as a geospatial database comes with many advantages, some of which are the support for raster and vector formats, the possibility to easily query tables and results through the web-based API, and the possibility to retrieve results using eo-learn. Another advantage is that an existing labelled data-set could be uploaded to Geopedia and used as a starting point for a review campaign, as is the case for the water level annotation campaign described below.
Crowd-sourcing water surface detection in Sentinel-2 image
Some time ago we have released the BlueDot Observatory, an open-source web service to monitor the surface level of water bodies globally. You can read more about it in this blog post. The observatory allows you to browse open water bodies like lakes, dams and reservoirs from all around the globe (yes, globe, not disc), and instantly view the trend in the water surface ratio against its nominal value for the past three years and a half.
The water segmentation algorithm that powers BlueDot uses a water index derived from Sentinel-2 images and applies dynamic thresholding to determine the optimal threshold value that best separates water and soil. Despite being simple, the algorithm generates satisfactory results in the majority of cases. However, artefacts introduced by the cloud masking algorithm or cloud shadows negatively affect the segmentation result, as shown below.
One possible way to improve robustness of the water segmentation algorithm is to use a machine learning model trained on labelled data of water bodies with global coverage under different artefact conditions. In order to collect the labelled data suitable to train such a model, we have created the dedicated public campaign Water-Body Segmentation Correction on the classification application. By uploading segmented water bodies to Geopedia, ClassificationApp randomly selects patches from a randomly chosen water body and allows the user to correct for errors present in the current segmentation. By relying on Geopedia, campaigns reviewing and correcting pre-existing classifications can be created. The ability to review and improve existing classification maps is very valuable when training machine learning models iteratively.
When selecting the water body campaign from the campaign selection page after logging in, a task is presented to the user. On the left side, Sentinel-2 images and the contour of the water body generated by the current algorithm are overlaid. Different band indices can be displayed to aid discerning the actual water surface level. On the right side, the existing water mask is overlaid onto the Sentinel-2 image, allowing the user to modify it and correct for possible mistakes. After editing, the resulting labelled image is saved to Geopedia. The resulting data-set can be downloaded as EOPatches which can be directly fed to a machine learning algorithm using eo-learn. The labelled data-set, generated by a public campaign, will be available to all contributors of the campaign.
This is, therefore, an official call to labelling arms, where we invite anyone to contribute to the water body segmentation correction campaign. Every single click is important. The resulting open-source data-set could be used to build a global water detection algorithm using machine learning. Such a model could be used for many applications, ranging from water level management and prediction to managing the flood response all over the world.
If you have any questions related to the water body campaign or ClassificationApp, please get in touch with us at email@example.com. We will be continuously improving the ClassificationApp front-end & back-end and adding new functionalities. Please contact us also in case you require assistance with setting up a new campaign or need functionality which is not yet supported, we will be happy to oblige!