Mean NDVI over Bonn, Germany, generated using Sentinel-2 imagery from June 2021 (🌐EO Browser)

Sharing your algorithms with Data on Demand on EDC

William Ray
Euro Data Cube
Published in
9 min readMar 28, 2022

--

Shortening the path from remote sensing research to algorithms running in production

In the previous EDC blog post, we introduced Data on Demand, which allows the EO community to share and sell their expertise and knowledge through the EDC platform. We showcased how users can access the algorithms available through EDC Browser. In this post we further this, introducing the data provider aspect of Data on Demand. We will introduce the concept of Bring Your Own Algorithm (BYOA) and show the steps required to develop and onboard your own algorithms and add them to the EDC Marketplace, where others will be able to access them.

Why we developed BYOA

We should move to cloud. But how?

Our motivation to develop BYOA was to help bring together algorithm developers and users and to benefit the remote sensing community with a new space to collaborate and share in. There is a huge amount of knowledge already out there, but it is still fragmented and difficult to navigate. By creating a centralised space in the EDC Marketplace, we wanted to create a place where developers and scientists could contribute to the community and where users could purchase and access this expertise.

You can read more about the motivation to develop BYOA in the previous post: Getting Data on Demand with EDC Browser.

How It Works

BYOA is the publisher side of the BYOA platform, where you develop and share your algorithms to the marketplace where users can access them through Insights on demand. The advantage of using EDC as your platform is that as a publisher you do not need to worry about developing and maintaining your own marketplace, as this is already implemented and operated by the EDC consortium. All you need to worry about is developing your own algorithm and developing your pricing plan. This means you can focus on what you’re good at. Next, let’s run through the steps you will need to go through to onboard your own algorithm to the EDC Marketplace.

Overview of the Bring Your Own Algorithm services showing how the platform acts as a bridge between consumers and providers of EO data.

How to onboard a simple algorithm to EDC

First, you should clone the example algorithm found in the Github repository here into your EOxHub Workspace. To do this you can use git clone https://github.com/eurodatacube/byoa-sample-algorithm.git in your chosen Command Line Interface.

In this repository, there are two Python Notebooks;

  • The notebook.ipynb which you will adapt with the algorithm that you wish to onboard.
  • The estimate_costs.ipynb which is a notebook used to calculate how many credits a user will have to purchase to implement your algorithm using the EDC Data On Demand service.

If you wish to use the example in this post, it is also available on Github here, so you can examine the full Jupyter Notebooks in your own time. However, to execute the cells in the notebook, you will need to have an active EDC subscription and it is strongly recommended that you clone the repository into your personal EOxHub Workspace.

Writing the script

Firstly, let’s open the notebook.ipynb file and run through what the cells contain. This notebook might appear a little different to the notebooks you are used to, as there are no markdown cells, making it similar to a traditional Python file. You can add markdown cells for your own documentation purposes, but the main purpose of this notebook is to be executed automatically so writing detailed markdown is not required.

The first cell is used to create some default parameters. In the default notebook, two are used; the spatial resolution which is set in degrees (unit of measurement in WGS84) and the AOI. These can be customised to suit your algorithm that you are onboarding. For example, if you are detecting ships in the ocean it makes sense for your default AOI to be on the coast or in the ocean.

You can also add additional default parameters, such as a default time range, which makes sense if you are examining seasonal phenomena. To do this, you just need to add the start and end dates as a daterange variable like below:

You can find out more about the different variable types compatible with EDC in the documentation under the Execution parameters section.

In the third cell, you also need to specify a default output directory for any results you generate when testing the script. You can find out the full file path of your output directory by running the pwd command using terminal. When your algorithm is ready to be pushed to production, your output directory should always be defined as:

output_dir = Path(“/home/jovyan/result-data”)

This will then ensure the results are saved to the local account of the user that executes the algorithm. Next, in the fifth cell, you set the configuration required to generate the data cube. Here you set which Sentinel Hub data collection you are using (S2L2A), and define other parameters such as the band_names, bbox, spatial_res and the time_range. This is the point where you can overwrite the parameters set in the first cell. For those unfamiliar with generating xcubes on the fly, at this point no image data has been downloaded. With EDC, the cube is generated “on the fly” meaning you only have to access the data from the cloud when required in your analysis.

Now that we have configured the data cube, we can implement the algorithm. In the example below, we generate a mean NDVI for a given area of interest and time range:

Firstly, the NDVI formula is defined. A new variable is then created in the data cube that we then save. Next to calculate the mean NDVI, the ndvi_sum and ndvi_count are calculated and used to calculate the mean average NDVI for each pixel in the AOI over the defined time period.

Note, that to save the new variable, we define a ‘long_name’ and the ‘units’ of the variable. This was explained previously in the Exploring Time and Space blog post. More documentation on this can be found here.

Mean NDVI generated using the Sentinel-2 L2A xcube displayed using Matplotlib

For testing purposes, you can use the inbuilt Matplotlib libraries to plot the algorithm result in the notebook. Once you have finished developing your algorithm within the EOxHub Workspace, you can then test whether the notebook works fully automatically - you can do this by executing the notebook in Jupyter Lab and debugging any cells that fail. If it runs without issues, then it should work automatically.

The next step of more advanced testing is running the notebook headlessly, with different sets of parameters to validate it fits the limits of the selected target environment. For example, you want to test whether your algorithm is scalable, does it run over larger areas or longer time periods?

What is Headless Execution?
Headless Execution is when you execute your code without using a Graphical User Interface (Jupyter Lab in our use case). Instead this is done using a Command Line Interface and network connection. Headless execution is used to test automated workflows such as your algorithm.

You can execute your notebook headlessly from your local CLI using a CURL request. This will look something like:

In case headless execution fails, you can check for possible errors within your Jupyter Lab. If you do get stuck then do not hesitate getting in contact with the support team who will be happy to help.

Once you can run your notebook fully automatically, you are nearly there! The next step is to define the execution parameters your algorithm needs. These are needed to populate the graphical interface of the EDC Browser application.

Execution Parameters

You will also need to provide a parameters definition JSON file so that the algorithm can integrate successfully with EDC Browser. In our example, there are three parameters required to run the algorithm; the AOI, the time range and the spatial resolution (an optional parameter).

You can see that each parameter is defined slightly differently but they all have some common features such as the description, name and type. In addition, you can make some parameters compulsory or optional. You can find more information about the parameters supported by BYOA in the EDC documentation. In addition, you will also need to produce a pricing model that is defined through the estimate_costs.ipynb file in your repository.

Estimating Costs Notebook

When you onboard your algorithm it is also required to create a Jupyter Notebook that is used to estimate the costs of an analysis for a user when they wish to purchase and run your algorithm. You should base this upon known input variables in your algorithm such as the size of the AOI and the resolution the analysis is being run at.

In the example above, we have created a formula that uses the time period and the area of interest to calculate the estimated cost for the user. We simply just multiply the area by the number of days in the analysis period.

You can make your pricing model as complex as you want, but it should be easily explainable to your users so that the costs of your analysis are easy to understand. For example, you can add complexities to your cost estimation, like providing different price plans depending on the spatial and temporal scale of your user’s request. In addition, it should reflect your processing costs too, so you don’t lose money whenever a user runs an analysis.

Next Steps

Once the above steps have been taken, you can get in touch with the EOx support team to say that your algorithm is ready to be added to the Marketplace. You will get access to your private GIT repository with the necessary setup scaffolded. You will need to stage, commit and push your notebook.ipynb and estimate_costs.ipynb files to your private git repository hosted by EOx. Once done you will need to provide the git commit ID, the parameters configuration file and sign an algorithm provider agreement. This part of the process will differ from provider to provider. All of this is documented in much more detail here, but if you do have any further questions, don’t hesitate getting in contact with the team either.

Now that you know how to bring your own algorithm to EDC, we’d love to see you do it yourself! With the help of the community, we want to make the EDC Marketplace the prime destination to go to share and purchase EO-derived products. We look forward to working with you in the future.

Further Information

Visit Euro Data Cube to find out about our subscription plans and take advantage of the free trial options. Additionally, the Network of Resources (NoR) initiated by ESA provides sponsored access to some of our services to qualifying researchers and entrepreneurs. You can follow these step-by-step instructions to apply for sponsorship.

To find out more on how to onboard your own algorithm visit here. To learn how to use onboarded algorithms, check here. For more inspiration on the endless capabilities for innovation in EO, the notebooks section of the marketplace has a diverse collection of practical use cases and tutorials. If you run into trouble using any of our services and need support, or have great ideas to share, you are welcome to contact us or post questions on the Euro Data Cube forum. Follow us on Twitter and Linkedin to stay informed on our latest developments.

Special Thanks to Bernhard Mallinger from EOx for his technical support in putting this post together. 👏

--

--