Expanding the Universe of Cloud API for EO Data

Grega Milcinski
Sentinel Hub Blog
Published in
10 min readAug 25, 2020

Sentinel Hub’s Roadmap for 2020/2021

We like to write down our annual roadmap, so that users know what to expect as well as to gather thoughts internally. It is also forcing us to prioritise our ideas. That being said, such a writing usually takes quite some time and it often happens that things change in the interim…
Before talking about the future, let’s see what we have done in the past and how we see the overall situation in the field at the moment.

Erebus Ice Tongue, February 2019, Sentinel-2, contains modified Copernicus data

The past
Based on the plans, which we described in a post one year ago, I am happy to state that we have accomplished more or less all goals, at least to a major extent.

  • Sentinel Hub process API is stable, supported by our Python SDK and used by many users, who appreciate the POST option so that they can use it with complex geometries. It now regularly happens that monthly statistics show more data being processed with the process API than with OGC.
  • Custom Scripts have evolved to allow more controlled processing of data — users can choose the inputs they will work with (e.g. “digital numbers” or “reflectance” with Sentinel-2), they can better handle the overlaps of the scenes and, most importantly, they can precisely define what happens to the output. We have maintained the “auto-magical” defaults to allow for faster uptake but experts appreciate detailed level of control. Based on the Pierre Markuse’s idea we started a Custom Scripts contest and it proved to be a big success bringing several tens of new scripts, which are now available open-source for anyone to use. It was such a good thing that we decided to repeat it (one started just recently, make sure you participate) and we will probably continue doing so.
More information about the Contest here
  • Bring your own data is a huge success. When designing this option, we assumed it will mostly serve to on-board data currently not yet supported by our service — e.g. commercial imagery, drones, etc. This is actually happening with several precision farming developers adding new data for their users. What surprised us though, is seeing many machine learning experts ingesting their derivatives. Thinking about it, it makes perfect sense. It’s not just to allow quick visualisation of the results, one can also use various analytical capabilities of Sentinel Hub — e.g. custom script-based comparison of the results with ground truth data, change detection, etc. An important consequence of BYOC was introduction of commercial data integration, starting with PlanetScope and Airbus Pleiades and SPOT. We’ve made deals with these companies, which allow users to get access to the higher resolution data in a much more convenient way — not just technically convenient, but avoiding long discussions with resellers to get the quote, large purchase commitments, etc. One can start using these data with as little as 100 EUR.
  • Machine learning has proved to be a massive driver to Sentinel Hub’s use. We have launched Batch Processor to support cost-efficient ML at global scale. It’s fun to observe how one user can process as much data with Batch in a month as all other Sentinel Hub users combined. It is not just about new features. We have also put a lot of effort into further developments of our open-source Python eo-learn library. A year and a half after its launch, the eo-learn Python package has become a popular tool within the EO and ML communities, with more than 14,000 total downloads, and hundreds of contributions. Perhaps one of the most important updates in the last year is pre-processing of cloud masks, based on our well-accepted s2cloudless algorithm, so that these can be accessed directly from Custom scripts, similarly to other bands and meta-data variables. Having these available makes it possible to perform AOI-based cloud filtering and to create ML features directly by the Batch process. Amongst other eo-learn improvements in reading/writing to disk storage and support of new data formats/sources (e.g. S1 imagery, tiled high resolution imagery, custom tiff/COGs, OSM extracts, Sentinel-Hub processing API) have been added, as well as new tasks for processing of S2 images (e.g. snow masking, multi-temporal cloud masking, time-series compositing and advanced interpolation). General improvements to the parallel execution and logging of workflows have facilitated the automatic execution of EO applications at global scale (e.g. water-level monitoring, urban settlements classification). New example applications (e.g. poverty-indicator estimation, land cover semantic segmentation, tree-cover prediction, S2 super-resolution) have been contributed, show-casing how eo-learn can be seamlessly coupled with the most popular ML frameworks (e.g. `scikit-learn`, `Keras`, `PyTorch`, `TensorFlow`, `fastai`). If you are interested in eo-learn, the workshop material is the ideal place to get started.
  • The last feature added to Sentinel Hub was data fusion (read about it here) allowing one to combine data from e.g. SAR and MSI sensors. We are still working to evolve this further, specifically to support multi-deployment data fusion, essentially making it possible to combine any data source whatsoever.

Another amazing activity in 2019 was seeing new users coming in. This is especially cool due to the fact that our marketing is not our strongest point. Our past company’s experience focused to large scale individual projects so we lack expertise in addressing masses. Our web-site, Twitter and this blog are more or less the only elements, missing various promotion campaigns, SEO, adwords et al. Well, there is also the EO Browser, used by hundreds of thousands of people. Its prime objective was to showcase what can be built on top of our services. It has however proved to be an essential tool to raise awareness of the EO data. It’s such a joy to look how non-experts come up with new interesting observations all around the world.

Importantly though, it’s not just beginners, who are coming in. What makes us super happy is seeing the well-established remote sensing experts, companies and research institutes, integrating our services in their workflows. These are power users, who have similar capabilities, feature-wise, available internally. Yet they decide to use Sentinel Hub as they have realised the efficiency, cost wise and, probably more importantly, effort wise, so that they can focus their knowhow to building added value services.

The present
We have to admit that when launching Sentinel Hub a few years ago, we’ve had no grand vision for the future. We simply thought this is a cool idea and pushed it on. Then we started to work on new features, various ones, trying to take care on remaining focused. And things neatly fell in the place.
Sentinel Hub was always oriented to empowering other developers to work with EO data more efficiently. We’ve always believed that added value generation will happen somewhere else — in 3rd party web application, in Python script running on users’ VMs and similar. Therefore, the user interface to our services was never that important. However, after building and operating EO Browser for a while, initially an idea by European Space Agency (ESA), we have noticed that it is bringing added value, not just promotion, both to our subscribers as well as to the general community. People use it to develop and test new EO algorithms, to find relevant data and to monitor, what’s happening with our planet.
Another important evolution, which happened by chance, was integration of Sentinel Hub into “Euro Data Cube”, a project initiated by ESA. There, our partners added two very important services — integration with xcube, which allows xarray based data analysis, so much favoured by data scientists, and hosted processing in Jupyter Lab.
Suddenly, Sentinel Hub is no longer just an API but also part of the facility supporting user doing just about anything that they want. It suddenly has, combining all elements, similar capabilities to Google Earth Engine. Something that we have never dreamed of.

There are a couple of other services, that are competing in the field — the last one being Descartes Labs Platform but also SkyWatch, EOS Platform and similar. None of these is sharing any information about the usage so it’s difficult to say where we are in this group, but with hundreds of millions of requests being processed every month and a sustainable business model we feel quite comfortable and certain about long-term prospects. The main competition are probably not other service providers but rather open-source initiatives from Development Seed, Azavea, Astraea and others, which are making it simpler to build the workflows by oneself. Still, there is effort needed to set-up and maintain these things, as well as the related ICT costs. We therefore believe that the added value that our service provides, outweighs the costs of operation. This might not have been the case for large scale processing but with the introduction of Batch, essentially reducing the cost by a factor of three, it’s again a decent value proposition.

The future
One lesson we have learned is that once the software gets really used, in an operational manner, there is never end of the maintenance work. There are errors found, scalability issues that cannot be solved by simply adding new virtual machines, the input data changing. Therefore, significant part of our future will simply be occupied by making sure that our services work well. Perhaps this effort is not easily visible, but it adds a tremendous value to our users as this means they do not need to worry about it — they can focus to building steps down the line, to create even better product on their end. In the past year we’ve put a lot of effort into automation of this so that we can still get our eyes off the infrastructure and back to the code, to address the plans for the coming year:

  • Evolution of our Sentinel-1 processing chain. Many experts we talk with are amazed to see that we are able to generate “analysis ready data’ in real-time — performing orthorectification, backscatter coefficient calculation, thermal noise removal, etc. One important part missing so far is radiometric terrain correction. We thought this is simply too complex, but further investigation demonstrated that it should be possible. So we are working on it and, in parallel, adding some more meta-data in the bunch so that we can get as close as possible to CARD4L-ready product. Sentinel-1 is a powerful dataset due to its consistency through time, unchallenged by the clouds in the sky.
  • Related to CARD4L, we are happy to announce that we are working with Geoscience Australia and NASA to power their Open Data Cube with on-demand generation of Sentinel-1 based cubes. We believe this will demonstrate the usability of “on-demand” data cubes, bridging the gap between “on-the-fly data cubes” (such as Sentinel Hub) and “pre-processed data cubes” (such as Open Data Cube).
  • While ingesting more than 100.000 tiles using “Bring your own data” feature, our users have identified some limitations, such as lack of support for multi-band files and ability for more detailed meta-data structure — things to be added.
  • One of the oldest APIs is “Statistical API” or “FIS”, which was developed to allow for easy and efficient access to statistical analysis of the time-series, e.g. to show an NDVI chart over the year. This service is now becoming more and more widely used, due to object-based approach to machine-learning. With increased use, deficiencies have been shown, both in terms of the features as well as the performance and costs. We are planning to rewrite this API in a similar way as the process API, allowing users more control over processing of statistical variables. As it often happens that our users need to run this API over millions of polygons, we have realised that we need a “Batch processing” option for Statistical API as well.
  • By moving from OGC standard interfaces (WMS, WCS, WMTS) to proprietary API, first-time users do need some more help to get ahold of its features. We have recognised that, so we are working on the Request Builder, a simple web-based user interface to allow generating requests in an easier way than Postman does. First beta version is already available here to test.
  • We are also always looking for new data to be added to our platform. Copernicus DEM is first to appear, 90-meter one being available to everyone (it’s better than SRTM due to its global coverage) and 30-meter and 10-meter to those that with permission to use it by ESA. Copernicus Services variables are as well high on the agenda. Last but not least, we truly hope that USGS will provide access to Collection 2 Landsat data on AWS, so that we can integrate that one.
Salt Lake Mackay, December 2018, Sentinel-2, contains modified Coperncus data
  • We are continuing to expand our deployed services — starting in AWS eu-central-1 and us-west-2, Sentinel Hub is now running also in CreoDIAS, Mundi Web Services and Onda. We have recently added WekEO and CODE-DE. These new deployments make it possible for us to onboard even more data, e.g. by ECMWF and DLR and for our users to access the API locally.
  • While working with our partners to integrate Sentinel Hub and xcube we have recognised the benefits for machine learning, so we are planning to make a deeper integration of eo-learn and xcube.
  • In addition to the core features our other teams are also working on practical applications. In the last year we have demonstrated how one can set-up a land cover monitoring system in Azerbaijan, essentially making it possible for authorities to monitor almost on a weekly basis on what is happening in their country. We have also completed land cover classification in Turkey. These experiences will guide us when developing agriculture-related machine learning models to further details, mostly to support Common Agriculture Policy in Europe, so called “Area Monitoring”.
  • There is also some clean-up going on. With our Custom script (EVALSCRIPT) definition reaching a mature status in version 3, we are dropping support for older versions on November 1st, so that we can move further faster.

It’s gonna be an interesting year, working remotely and only meeting all of you virtually…

--

--