Introducing eo-learn
Bridging the gap between Earth Observation and Machine Learning
--
The availability of open Earth Observation (EO) data through the Copernicus and Landsat programs represents an unprecedented resource for many EO applications, ranging from land use and land cover (LULC) monitoring, crop monitoring and yield prediction, to disaster control, emergency services and humanitarian relief. Given the large amount of high spatial resolution data at high revisit frequency, frameworks able to automatically extract complex patterns in such spatio-temporal data are required. eo-learn
aims at providing a set of tools to make prototyping of complex EO workflows as easy, fast, and accessible as possible.
So, what is eo-learn
? eo-learn
is an open-source Python library that acts as a bridge between Earth Observation/Remote Sensing and Python ecosystem for data science and machine learning (ML). On one hand, its aim is to make entry to the field of remote sensing for non-experts easier. On the other, to bring the state-of-the-art tools for computer vision, machine learning, and deep learning existing in Python ecosystem to remote sensing experts.
eo-learn
is easy to use, its design modular, and encourages collaboration — sharing and reusing of specific tasks in a typical EO-value-extraction workflow, such as cloud masking, image co-registration, feature extraction, classification, etc. Everyone is free to use any of the available tasks and is encouraged to improve upon them, develop new ones and share them with the rest of the community. The library is shared under MIT license so one can use it even if they do not want to share. There is so much of untapped potential in remote sensing that we are not too concerned about competition using our tools. Who knows, perhaps someone will save the Planet with it. Everyone wins. That being said, we believe there should be more sharing in EO so we’d love to see it done here as well.
In a nutshell
The library uses NumPy arrays and Shapely geometries to store and handle remote sensing data. It is currently available on our GitHub and coming soon to the Python Package Index. You can find documentation on ReadTheDocs.
The building blocks of eo-learn
are EOPatch
, EOTask
and EOWorkflow
objects. All data are stored in EOPatch
instances, where dictionaries store NumPy arrays and Shapely geometries for time-dependent spatial information (e.g. Sentinel-2, Landsat 8 or Sentinel-1 bands, cloud masks, etc.), time-independent spatial information (e.g. Digital Elevation Model, target LULC maps, count of valid pixels, etc.) and time-dependent and time-independent scalar information (e.g. labels for change detection, sun angles, etc.). An EOPatch
instance is uniquely defined by coordinates of a bounding box and the time-interval the stored data refers to. Information in any format readable by Python packages can also be stored in EOPatch
objects.
EOPatch
.Any operation on EOPatch
instances is performed by EOTask
instances. Tasks are grouped by scope and packaged into separate Python sub-packages, which currently are:
eo-learn-core
— The core sub-package which implements the basic building blocks (EOPatch
,EOTask
andEOWorkflow
) and commonly used functionalities.eo-learn-io
— Input/output sub-package that deals with obtaining data from Sentinel Hub services and Geopedia.eo-learn-mask
— Collection of tasks used for masking of data and calculation of cloud masks.eo-learn-features
— A collection of tasks for extracting data properties and feature manipulation. Examples include tasks for computing spatio-temporal and Haralick features, as well as interpolation tasks.eo-learn-geometry
— Sub-package to handle geometric transformations, such as vector to raster conversion, and sampling of label masks for generating training sets for ML methods.eo-learn-ml-tools
— Collection of ML utility tasks useful to set up or validate a ML model.eo-learn-coregistration
— Collection of tasks that implement different image co-registration techniques.
For a list of currently implemented EOTask
have a look here. If the task you are looking for is not yet implemented, worry not! Creating a new EOTask
is as simple as this:
EOTask
classes created by users can then be added to the code-base with a simple pull request, adding new tools and functionalities that can benefit the entire community.
Finally, a complete pipeline is built by connecting tasks using EOWorkflow
. EOWorkflow
allows definition of a workflow in the form of an acyclic graph, where EOTask
instances are vertices of the graph and EOPatch
instances flow through the edges connecting the vertices. Once the workflow has been defined, it can be run in parallel to different input EOPatch
instances, allowing to automatically process large amounts of spatio-temporal data. EOWorkflow
also provides execution monitoring reports and logs, such as input parameters of EOTask
, elapsed times, memory usage and raised exceptions, facilitating execution control and versioning of complete ML pipelines.
Check the README and the documentation for more technical information on how eo-learn
works.
Example applications
eo-learn
was designed to provide the most common operations to process spatio-temporal data that would allow building of complete remote sensing applications. In order to showcase in more detail the potential of eo-learn
, we will shortly post two blog series on land use and land cover classification at a country level using machine learning, and on the creation of a complete service for automatic global water-level monitoring, both using eo-learn
and the Copernicus data. Some material to get you started on these use cases can already be found in the examples folder.
Given our well-known interest in working with time-series and creating time-lapses, in this blog we share a simple EOWorkflow
to automatically generate time-lapses given a bounding box and a time-range. To generate a time-lapse like the one shown below, the required tasks are S2L1CWCSInput
, AddCloudMaskTask
, SimpleFilterTask
and a custom MakeGIFTask
.
And if the time-series is affected by orthorectification issues, as is often the case for Sentinel-2 images acquired prior to 2017, one can add a RegistrationTask
to estimate and compensate for the misalignment existing between time-frames, as shown below. The script used to generate these GIFs can be found here.
Community
A key resource for the success of eo-learn
is, of course, the community, both of remote sensing and machine learning experts. We therefore invite anyone with interests in developing large-scale remote sensing applications using spatio-temporal satellite imagery to try eo-learn
out, give us feedback, and possibly contribute to it. We welcome code improvements, new EOTask
classes, and new workflow examples. Users have already contributed some tasks, as is the case for the Haralick features created by developers at Magellium.
We are constantly improving on new functionalities, stability, and efficiency on tasks and workflows, so some things are likely to change in the future as the library grows. However, we will try to minimise breaking changes as much as possible in future releases. The first beta release on PyPI is planned in a couple of weeks.
We will be show-casing eo-learn
at the International Conference on Knowledge Discovery and Data Mining in London on 19th-23rd August, so please stop by if you are planning to attend. Stay tuned for our series on land use and land cover classification and how to set up a complete service for global monitoring of water-level in reservoirs and water bodies.
eo-learn
is a by-product of the Perceptive Sentinel European project. The project has received funding from European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement 776115.