Introducing eo-learn

Bridging the gap between Earth Observation and Machine Learning

The availability of open Earth Observation (EO) data through the Copernicus and Landsat programs represents an unprecedented resource for many EO applications, ranging from land use and land cover (LULC) monitoring, crop monitoring and yield prediction, to disaster control, emergency services and humanitarian relief. Given the large amount of high spatial resolution data at high revisit frequency, frameworks able to automatically extract complex patterns in such spatio-temporal data are required. eo-learn aims at providing a set of tools to make prototyping of complex EO workflows as easy, fast, and accessible as possible.

Example of remote sensing workflow that can be build using eo-learn. This workflow is used to create a global service for water-level monitoring of reservoirs and water bodies.

So, what is eo-learn? eo-learn is an open-source Python library that acts as a bridge between Earth Observation/Remote Sensing and Python ecosystem for data science and machine learning (ML). On one hand, its aim is to make entry to the field of remote sensing for non-experts easier. On the other, to bring the state-of-the-art tools for computer vision, machine learning, and deep learning existing in Python ecosystem to remote sensing experts.

eo-learn is easy to use, its design modular, and encourages collaboration — sharing and reusing of specific tasks in a typical EO-value-extraction workflow, such as cloud masking, image co-registration, feature extraction, classification, etc. Everyone is free to use any of the available tasks and is encouraged to improve upon them, develop new ones and share them with the rest of the community. The library is shared under MIT license so one can use it even if they do not want to share. There is so much of untapped potential in remote sensing that we are not too concerned about competition using our tools. Who knows, perhaps someone will save the Planet with it. Everyone wins. That being said, we believe there should be more sharing in EO so we’d love to see it done here as well.

In a nutshell

The library uses NumPy arrays and Shapely geometries to store and handle remote sensing data. It is currently available on our GitHub and coming soon to the Python Package Index. You can find documentation on ReadTheDocs.

The building blocks of eo-learn are EOPatch, EOTask and EOWorkflow objects. All data are stored in EOPatch instances, where dictionaries store NumPy arrays and Shapely geometries for time-dependent spatial information (e.g. Sentinel-2, Landsat 8 or Sentinel-1 bands, cloud masks, etc.), time-independent spatial information (e.g. Digital Elevation Model, target LULC maps, count of valid pixels, etc.) and time-dependent and time-independent scalar information (e.g. labels for change detection, sun angles, etc.). An EOPatch instance is uniquely defined by coordinates of a bounding box and the time-interval the stored data refers to. Information in any format readable by Python packages can also be stored in EOPatch objects.

Example of spatial data that can be stored in an EOPatch in raster and vector format. These data are needed to build a machine learning model for LULC map classification. In addition, non-spatial data as well as any data format readable in Python can be stored in an EOPatch.

Any operation on EOPatch instances is performed by EOTask instances. Tasks are grouped by scope and packaged into separate Python sub-packages, which currently are:

  • eo-learn-core — The core sub-package which implements the basic building blocks (EOPatch, EOTask and EOWorkflow) and commonly used functionalities.
  • eo-learn-io — Input/output sub-package that deals with obtaining data from Sentinel Hub services and Geopedia.
  • eo-learn-mask — Collection of tasks used for masking of data and calculation of cloud masks.
  • eo-learn-features — A collection of tasks for extracting data properties and feature manipulation. Examples include tasks for computing spatio-temporal and Haralick features, as well as interpolation tasks.
  • eo-learn-geometry — Sub-package to handle geometric transformations, such as vector to raster conversion, and sampling of label masks for generating training sets for ML methods.
  • eo-learn-ml-tools — Collection of ML utility tasks useful to set up or validate a ML model.
  • eo-learn-coregistration — Collection of tasks that implement different image co-registration techniques.

For a list of currently implemented EOTask have a look here. If the task you are looking for is not yet implemented, worry not! Creating a new EOTask is as simple as this:

EOTask classes created by users can then be added to the code-base with a simple pull request, adding new tools and functionalities that can benefit the entire community.

Example of NDVI trends derived from Sentinel-2 over a year of observations. Red shows values for cultivated land, blue for build-up area, and green for grassland. eo-learn provides tasks to handle spatio-temporal processing such as masking and filtering of cloudy observations (empty circles), and interpolation of valid data (filled circles) to generate an interpolated time-series (continuous line). Different interpolation methods (e.g. linear, univariate spline, B-spline, Akima) have been implemented.

Finally, a complete pipeline is built by connecting tasks using EOWorkflow. EOWorkflow allows definition of a workflow in the form of an acyclic graph, where EOTask instances are vertices of the graph and EOPatch instances flow through the edges connecting the vertices. Once the workflow has been defined, it can be run in parallel to different input EOPatch instances, allowing to automatically process large amounts of spatio-temporal data. EOWorkflow also provides execution monitoring reports and logs, such as input parameters of EOTask, elapsed times, memory usage and raised exceptions, facilitating execution control and versioning of complete ML pipelines.

Check the README and the documentation for more technical information on how eo-learn works.

Example applications

eo-learn was designed to provide the most common operations to process spatio-temporal data that would allow building of complete remote sensing applications. In order to showcase in more detail the potential of eo-learn, we will shortly post two blog series on land use and land cover classification at a country level using machine learning, and on the creation of a complete service for automatic global water-level monitoring, both using eo-learn and the Copernicus data. Some material to get you started on these use cases can already be found in the examples folder.

Example of water-level segmentation using multiple sources, in particular Sentinel-1 (left), Sentinel-2 (middle) and Digital Elevation Model (right). Using multiple sources leads to a more accurate delineation of the water body.

Given our well-known interest in working with time-series and creating time-lapses, in this blog we share a simple EOWorkflow to automatically generate time-lapses given a bounding box and a time-range. To generate a time-lapse like the one shown below, the required tasks are S2L1CWCSInput, AddCloudMaskTask, SimpleFilterTask and a custom MakeGIFTask.

Time-lapse of Ouarzazate Solar power station in Morocco. Originally created by Simon Gascoin on Twitter.

And if the time-series is affected by orthorectification issues, as is often the case for Sentinel-2 images acquired prior to 2017, one can add a RegistrationTask to estimate and compensate for the misalignment existing between time-frames, as shown below. The script used to generate these GIFs can be found here.

Time-lapse after frame co-registration using a rigid transformation. Misalignments can be seen for initial frames as registration errors accumulate over the time-series.

Community

A key resource for the success of eo-learn is, of course, the community, both of remote sensing and machine learning experts. We therefore invite anyone with interests in developing large-scale remote sensing applications using spatio-temporal satellite imagery to try eo-learn out, give us feedback, and possibly contribute to it. We welcome code improvements, new EOTask classes, and new workflow examples. Users have already contributed some tasks, as is the case for the Haralick features created by developers at Magellium.

We are constantly improving on new functionalities, stability, and efficiency on tasks and workflows, so some things are likely to change in the future as the library grows. However, we will try to minimise breaking changes as much as possible in future releases. The first beta release on PyPI is planned in a couple of weeks.

We will be show-casing eo-learn at the International Conference on Knowledge Discovery and Data Mining in London on 19th-23rd August, so please stop by if you are planning to attend. Stay tuned for our series on land use and land cover classification and how to set up a complete service for global monitoring of water-level in reservoirs and water bodies.

eo-learnis a by-product of the Perceptive Sentinel European project. The project has received funding from European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement 776115.