Scale-up your eo-learn workflow using Batch Processing API

Maxim Lamare
Sentinel Hub Blog
Published in
7 min readSep 17, 2020

How to generate harmonized ML features over huge areas and not become broke.

As Grega Milcinski noted recently in Sentinel Hub’s Roadmap, eo-learn has become widely adopted within the Earth Observation and Machine Learning communities. The library has been used for a variety of both small and large scale exercises, but for those working on really large use-cases, the performance and cost of the synchronous access to data powered by process API can pose issues.

2018 Corine Land Cover inventory over Europe. You could use this product as reference data in your workflow or generate it yourself, but it would require a LOT of input data and processing power!

To make the process simpler, faster and more efficient, we have developed a workflow that replaces data import, resampling and interpolation steps in eo-learn with Batch Processing: an automatically managed workflow that enables you to request data for large areas. By offloading these tasks to Batch Processing, you will get results faster and reduce costs, not only on the service side as Batch Processing consumes three times fewer processing units, but also on the computing side because the majority of the steps are performed by our services.

Sentinel-2 true-colour image of Slovenia: fetch the data stored in our bucket.

You may recall that a year and a half ago, Matic Lubej wrote a trilogy, read more than 50.000 times, on how to build a land-use / land-cover classification workflow based on machine learning fed with Sentinel-2 data using eo-learn. The last article even came with a freebie: 200GB of EOPatches at 10m resolution covering whole Slovenia!

In this piece, we will build upon the example workflow presented in Matic’s posts, showing you how to replace the data preparation steps with Batch Processing. To keep things simple, our example is based on the existing Land-use/Land-cover example for Slovenia that can be found in the eo-learn documentation. In order for you to follow along and execute the different steps yourself, we have prepared an example Jupyter Notebook, that you can find on Github or in the EuroDataCube.

Process overview

Let’s recapitulate the steps that were performed in the eo-learn land-use land-cover example. First, an AOI covering Slovenia was defined and split into ~300 smaller patches. Second, for each patch, Sentinel-2 bands were downloaded for each acquisition during 2017 using sentinelhub-py , and derived indices were calculated (NDVI, NDWI and NDBI). Reference data, to be used as training data for the Machine Learning model, was then added from an external source. Finally, the values were interpolated over a regular time grid, in order to smooth-out gaps due to non-constant satellite acquisition dates and cloud cover. From there, the data was fed to the machine learning algorithm to predict Land-use/Land-cover.

Using the Batch Processing API allows you to move most of the tasks described above from your computer to the cloud. Following the same workflow, an AOI is first defined but not split into patches, as the Batch Processing service takes care of tiling automatically. Querying the satellite images, resampling them over a regular time grid, and calculating the indices, all done by the Sentinel Hub, and the results are saved to an Amazon S3 bucket as COGs. A simple Python script then converts them to NumPy arrays and voilà, the EOPatches are available, either in the bucket or directly on your computer. At this point, you can add the reference data and continue with eo-learn workflow as you normally would.

Processing workflow difference between the “classic” eo-learn approach (top) and the use of the Batch Processing API in the eo-learn workflow (bottom). Processes located above the dashed line for each workflow are performed on Sentinel Hub servers, whereas processes draw below the line are performed on your computer.

Batch Processing steps

What is Batch Processing exactly?

To get started, you can find an overview of the Batch Processing API in our Medium article published earlier this year. In summary, the Batch Processing API is an asynchronous REST service designed for querying data over large areas, delivering results directly to an Amazon S3 bucket. The API supports a number of user actions, allowing to CREATE, ANALYSE , START or CANCEL the processing. For more information on setting up the bucket, the different commands, or to see a request example you can refer to the documentation page.

Workflow overview of the Batch Processing commands, and the different statuses that can be triggered.

How do I replace eo-learn steps with a Batch request?

To run a Batch Request, you will first need an Evalscript and its accompanying payload. Instead of processing data on your computer with eo-learn, you need to describe, in Evalscript, the steps needed to be done:

  • Fetch all valid satellite bands for a given time-range. To determine the validity of a pixel, we use dataMask and s2cloudless mask.
  • Resample (linear interpolation) the valid pixels to a uniform time-step defined by the user.
  • Calculate indices (e.g. NDVI, NDWI and NDBI…)
Part of the Evalscript showing the main steps performed with the Batch Processing API.

The payload defines the input (AOI, time range…) and output (returned format and bands) parameters to be passed to Sentinel Hub (see the API Reference page).

In our payload, we specify:

  • the bounds of our entire AOI and its coordinate reference system,
  • the input dataset,
  • the time-range over which the data will be queried,
  • the output bands and their format,
  • the Evalscript previously defined,
  • the tiling grid id: you can choose between several available tiling grids: the Batch service will automatically divide the AOI into tiles and process each tile separately,
  • the spatial resolution at which the data will be processed (make sure that the resolution is supported by the tiling grid),
  • the Amazon S3 bucket name, in which the results will be stored,
  • and finally a description of your choice, for later reference.

Once your Evalscript and corresponding payload are defined, you can run the Batch Processing commands in your Python script. After you CREATEthe request, you can check if your Evalscript doesn’t contain errors and estimate the cost (processing units) with the ANALYSE command. If all looks good, you can START the processing, grab a coffee and wait for your Amazon Bucket to be filled with data! To keep track of the processing you can regularly check the progress of your request using our plotting tool that shows the status of each tile.

To track the progress of the Batch Request, we developed a function to visualise the status of the tiles.

Converting Batch Processing results to EOPatches

What do I do with all this data in my bucket?

Structure of the saved Batch Processing results in your AWS bucket.

Once batch process has successfully completed, you will find all the outputs located in the Amazon S3 bucket, organised by folders representing tiles (defined in the payload). In this example workflow, we have requested the interpolated satellite bands and indices as individual Geotiff files, meaning that for each tile we have 6 satellite bands, 3 indices and 1 data mask.

The remaining step of the process is to convert the bands to EOPatches in order to continue the eo-learn workflow. To do so, we prepared a Python class that allows you to easily convert the Batch Processing results to EOPatches, allowing you to store the EOPatches either in the same bucket as the results or locally.

The structure of newly acquired EOPatches is identical to the one obtained following the eo-learn example notebook, except for one difference: the reference data used to train the machine-learning is missing. In our example notebook, we show how to ingest the reference map for Slovenia, creating a raster mask from the vector polygons representing different land classes. Et voilà! You now have your data ready and organised for predicting Land Use / Land cover for a fraction of the time and cost compared to the classic eo-learn approach. From this point onwards, you can construct and train the machine learning model, make the predictions for each patch and visualise the results.

Left: True Color RGB of an EOPatch (interpolated data) converted from a Batch Processing request; Centre: NDVI of the EOPatch; Right: reference Land Use / Land Cover data for the EOPatch, imported after having converted the Batch Processing request to EOPatches.

Of course, Land Use / Land Cover prediction is not the only application that can be run using Batch Processing in combination with eo-learn. The freedom of writing your own Evalscript offers the flexibility to tailor the acquisition and pre-processing of the data to fit exactly your needs. The example presented in our Jupyter Notebook is just another step towards making big data handling and analysis in Earth observation easier, and we look forward to seeing large-scale applications of eo-learn flourish!

If you are already a user of eo-learn and decide to use Batch requests to ramp up your processing, don’t hesitate to share your results, ideas or questions with us via our forum or twitter. If you haven’t used eo-learn yet, we have plenty of resources to get you started: feel free to open an account and test the capabilities of our services!