Bridging the gap between Earth Observation and Machine Learning
The availability of open Earth Observation (EO) data through the Copernicus and Landsat programs represents an unprecedented resource for many EO applications, ranging from land use and land cover (LULC) monitoring, crop monitoring and yield prediction, to disaster control, emergency services and humanitarian relief. Given the large amount of high spatial resolution data at high revisit frequency, frameworks able to automatically extract complex patterns in such spatio-temporal data are required.
eo-learn aims at providing a set of tools to make prototyping of complex EO workflows as easy, fast, and accessible as possible.
So, what is
eo-learn is an open-source Python library that acts as a bridge between Earth Observation/Remote Sensing and Python ecosystem for data science and machine learning (ML). On one hand, its aim is to make entry to the field of remote sensing for non-experts easier. On the other, to bring the state-of-the-art tools for computer vision, machine learning, and deep learning existing in Python ecosystem to remote sensing experts.
eo-learn is easy to use, its design modular, and encourages collaboration — sharing and reusing of specific tasks in a typical EO-value-extraction workflow, such as cloud masking, image co-registration, feature extraction, classification, etc. Everyone is free to use any of the available tasks and is encouraged to improve upon them, develop new ones and share them with the rest of the community. The library is shared under MIT license so one can use it even if they do not want to share. There is so much of untapped potential in remote sensing that we are not too concerned about competition using our tools. Who knows, perhaps someone will save the Planet with it. Everyone wins. That being said, we believe there should be more sharing in EO so we’d love to see it done here as well.
In a nutshell
The library uses NumPy arrays and Shapely geometries to store and handle remote sensing data. It is currently available on our GitHub and coming soon to the Python Package Index. You can find documentation on ReadTheDocs.
The building blocks of
EOWorkflow objects. All data are stored in
EOPatch instances, where dictionaries store NumPy arrays and Shapely geometries for time-dependent spatial information (e.g. Sentinel-2, Landsat 8 or Sentinel-1 bands, cloud masks, etc.), time-independent spatial information (e.g. Digital Elevation Model, target LULC maps, count of valid pixels, etc.) and time-dependent and time-independent scalar information (e.g. labels for change detection, sun angles, etc.). An
EOPatch instance is uniquely defined by coordinates of a bounding box and the time-interval the stored data refers to. Information in any format readable by Python packages can also be stored in
Any operation on
EOPatch instances is performed by
EOTask instances. Tasks are grouped by scope and packaged into separate Python sub-packages, which currently are:
eo-learn-core— The core sub-package which implements the basic building blocks (
EOWorkflow) and commonly used functionalities.
eo-learn-io— Input/output sub-package that deals with obtaining data from Sentinel Hub services and Geopedia.
eo-learn-mask— Collection of tasks used for masking of data and calculation of cloud masks.
eo-learn-features— A collection of tasks for extracting data properties and feature manipulation. Examples include tasks for computing spatio-temporal and Haralick features, as well as interpolation tasks.
eo-learn-geometry— Sub-package to handle geometric transformations, such as vector to raster conversion, and sampling of label masks for generating training sets for ML methods.
eo-learn-ml-tools— Collection of ML utility tasks useful to set up or validate a ML model.
eo-learn-coregistration— Collection of tasks that implement different image co-registration techniques.
For a list of currently implemented
EOTask have a look here. If the task you are looking for is not yet implemented, worry not! Creating a new
EOTask is as simple as this:
EOTask classes created by users can then be added to the code-base with a simple pull request, adding new tools and functionalities that can benefit the entire community.
Finally, a complete pipeline is built by connecting tasks using
EOWorkflow allows definition of a workflow in the form of an acyclic graph, where
EOTask instances are vertices of the graph and
EOPatch instances flow through the edges connecting the vertices. Once the workflow has been defined, it can be run in parallel to different input
EOPatch instances, allowing to automatically process large amounts of spatio-temporal data.
EOWorkflow also provides execution monitoring reports and logs, such as input parameters of
EOTask, elapsed times, memory usage and raised exceptions, facilitating execution control and versioning of complete ML pipelines.
eo-learn was designed to provide the most common operations to process spatio-temporal data that would allow building of complete remote sensing applications. In order to showcase in more detail the potential of
eo-learn, we will shortly post two blog series on land use and land cover classification at a country level using machine learning, and on the creation of a complete service for automatic global water-level monitoring, both using
eo-learn and the Copernicus data. Some material to get you started on these use cases can already be found in the examples folder.
Given our well-known interest in working with time-series and creating time-lapses, in this blog we share a simple
EOWorkflow to automatically generate time-lapses given a bounding box and a time-range. To generate a time-lapse like the one shown below, the required tasks are
SimpleFilterTask and a custom
And if the time-series is affected by orthorectification issues, as is often the case for Sentinel-2 images acquired prior to 2017, one can add a
RegistrationTask to estimate and compensate for the misalignment existing between time-frames, as shown below. The script used to generate these GIFs can be found here.
A key resource for the success of
eo-learn is, of course, the community, both of remote sensing and machine learning experts. We therefore invite anyone with interests in developing large-scale remote sensing applications using spatio-temporal satellite imagery to try
eo-learn out, give us feedback, and possibly contribute to it. We welcome code improvements, new
EOTask classes, and new workflow examples. Users have already contributed some tasks, as is the case for the Haralick features created by developers at Magellium.
We are constantly improving on new functionalities, stability, and efficiency on tasks and workflows, so some things are likely to change in the future as the library grows. However, we will try to minimise breaking changes as much as possible in future releases. The first beta release on PyPI is planned in a couple of weeks.
We will be show-casing
eo-learn at the International Conference on Knowledge Discovery and Data Mining in London on 19th-23rd August, so please stop by if you are planning to attend. Stay tuned for our series on land use and land cover classification and how to set up a complete service for global monitoring of water-level in reservoirs and water bodies.
eo-learnis a by-product of the Perceptive Sentinel European project. The project has received funding from European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement 776115.