Becoming a Data Driver in the Cloud for Earth Observation
Sentinel Hub’s Roadmap for 2019
Sentinel Hub has been operating in production for more than two years, processing millions of requests every day. Many people see it as a tool to produce nice images, powering apps like EO Browser and filling a Twitter feed. In reality, though, it is much more. Sentinel Hub is a data driver to satellite images stored in a cloud, abstracting and generalizing a complexity of various formats, meta-data, processing chains, etc., and making the data available in a faster manner. In coming weeks we will be introducing a new API to make this process even more straightforward. Many new things will follow…
1. Sentinel Hub API v.2.0
When designing Sentinel Hub we always knew that there was huge value in satellite data, much more than just imagery. Searching for the most suitable interface to present to the users, we chose the Open Geospatial Consortium (OGC) standards, specifically WMS, WCS and WMTS. The rationale was simple — these standards are well supported in many geospatial applications, from QGIS, ESRI and similar, to web toolkits like Leaflet, OpenLayers, Google Maps. The power of getting the whole Sentinel archive in a web app with 4 lines of code was magnificent. When using OGC interfaces in production, however, we encountered several difficulties, from the infamous lat/lon order of coordinates in different versions of WMS/WCS, GET type of request being limited with number of characters, to not effectively supporting multi-temporal and multi-spectral datasets. Most importantly though, WMS and WMTS were designed to serve maps, not complex raster data. Even WCS does not support all required operations. We might embrace future steps of OGC such as WCS 2.0 EO Extension, WCPS and similar. But all these are barely used in practice and their support in existing tools is (still) significantly limited, thus defeating the main advantage of using a standard.
Sentinel Hub API v.2.0 will among other support:
- POST type of request, removing the char number limit in URL, making it possible to clip the result to a whole mask of field data.
- Proper authentication, to be able to safely integrate our services in public apps without a fear of abuse
- Multi-part response, e.g. GeoTiff (for raster data) + JSON (for meta-data or some script debugging information). Or separate GeoTiffs for band values.
- Perhaps most importantly, we got rid of normalization step. We put extreme care into bringing the original pixel value to the output without any unwanted change. For example, if one wants to get original reflectance value of Sentinel-2 IR band, she can ask for 10 meter resolution (to avoid resampling), appropriate UTM coordinate system (to avoid reprojection) and 16-bit uint tiff and the data will come through unmodified.
- In line with the above, the NODATA values will be handled in a more transparent manner.
- We are also adding a new option to control exactly how data is assessed in the areas where several scenes overlap, e.g. on the border of scenes or even orbits. Current versions tries to abstract this to make it simpler for the user, merging scenes taken on the same date. However, for scientifically-accurate processing, more details are needed. We will therefore add an option to run through each and every tile at the location and make necessary calculations.
The API is planned for a pilot release in Q1 2019. “Pilot” stands for possible interface changes in the coming months if we figure out something would work much better. There will always be at least two weeks advance notice for our pilot users to adapt. Those interested to use the pilot version, drop us an e-mail.
2. Custom script v.2
We went beyond typical band combinations and simple indices from the very beginning and our users love the fact they can develop their own models, some doing really magnificent stuff, e.g. identifying burned area or the growth cycle of the crops.
With pushing the limits even further we have had to change the Custom script patterns. Please check this page to get details. Fear not, we took care for backward compatibility so old scripts will be working in the future as well.
A demonstration of new capabilities is Leaf Area Index (LAI), which is based on a neural network. We wonder, what kind of interesting ideas will this trigger with our users.
This is actually already in production so anyone can use it if they want.
Some additional examples:
- Fraction of Absorbed Photosynthetically Active Radiation (fAPAR)
- Leaf Chlorophyll Content
- Canopy Chlorophyll Content
3. Support for “bring-your-own” data
One of the main gems of Sentinel Hub is the fact that it hides the complexity of various data formats, meta-data and mission-specific processing steps, which differ significantly when one goes from e.g. Sentinel-2 to Sentinel-1, or even from Landsat-8 to Sentinel-2. But there is another important advantage — the simplicity of accessing data through the same interface. In today’s cloud environment, most of the original satellite data is stored in the object storage, accessible over a standard S3 interface. But just getting access to the data is not sufficient. One has to allocate compute resources for processing, storage for intermediate steps, etc. Sentinel Hub hides this as well — it provides a uniform interface, specialized for EO data, which one can use in a similar manner as one uses S3, but instead of raw file, they gets pixel value in the way fitting him best. Such access is in most cases also less costly as we, operators of Sentinel Hub, are able to optimize resources in a better way, both by facilitating economy of scale as well as by simply putting a lot of engineering effort. One should not forget that in addition to ICT costs there is also man-power required to maintain such solutions. Another dimension of cost optimization…
Due to these facts we are getting more and more requests to support various other data sets, e.g. other satellite missions, aerial imagery or even some processing results. We are planning to address this need by introducing support for data, which our users will either upload to our platform or simply connect to their S3 bucket. End result should be an access to these data with all the Sentinel Hub functionalities and processing power.
One of the applications for this feature is integration with AWS Ground Station service, which was announced at re:Invent 2018. We expect there will be many satellite imagery providers using it, producing COGs or similar as the final output and storing them on S3. Connecting this with Sentinel Hub will make these scenes available immediately with all additional features.
4. Data fusion
We do support this to some extent already, the nice example being Sentinel-1 GRD ortorectified product, which is processed in AWS EU-1 region (or one of the DIAS-es) but requires digital elevation model (DEM) data stored in AWS US-West region.
We plan to generalize this concept, making it possible for EO experts to access data from all of the supported data sources within the same processing script.
5. Support to machine learning
With the volume of data being created every day ML is becoming more and more important and many use Sentinel Hub within their magic processes. We have established eo-learn and sentinelhub-py packages to make it easier than ever to build new models based on EO data. The way of processing in ML is different than in web apps though and we have identified some features we have to add to our platform. First one is mass processing. SH was designed to serve on-the-fly requests in less than a second. This makes it a bit limited when one wants to run a script over larger area, usually requiring him to split the area in smaller chunks (currently recommended 512x512 px size), store results on his infrastructure and then combine them at the end. The first step to address this was creation of Sentinel-2 Global Mosaic service, which is able to process yearly sets of data for whole countries and come up with the most representative pixel. This often requires to assess trillions of pixels for one country alone. We have managed to optimize the process quite a bit. Result is not there in seconds, but in a couple of hours for sure. A similar approach as for best pixel could be taken for any other kind of algorithm, making this service invaluable. The other thing we are looking into is to be able to store some intermediate processing steps (e.g. multi-temporal and multi-spectral chips) in the cloud, making them available in the same fashion as the rest of the data.
6. Integration across the field
The Sentinel Hub platform was initially running on AWS and it remains our main work horse. However, in order to get access to as many data sources as possible, and to get close to our users, we have deployed our service first to Interactive Platform Testbed in Poland (now CreoDIAS) — getting access to Sentinel-3, Sentinel-5P, Landsat 5, 7 and 8 in Europe, Envisat MERIS — and the AWS US-West region — serving global Landsat-8, MODIS and digital elevation model datasets. During 2018 we have successfully deployed our system to two more DIAS-es — Mundi and ONDA, and in 2019 we will have another one running — WEkEO. We are planning to integrate all these options so that one can get access to any of these from wherever she will be using the service.
7. Additional tools for integration
Sentinel Hub’s mission is to empower our users — scientists, commercial providers or just EO enthusiasts — by offering industrial-grade, trustworthy, reliable, high-performance and versatile “data mart” on the shores of all “great lakes” of EO big data. The current volume of use and growing number of subscribers are hopefully telling us that we are going in the right direction. We are confident that the new features coming this year will push us even further towards our mission objectives.
For the readers, who made it to the end, make sure you also check our blog posts on automatic land cover classification:
If you want to help us reach (and exceed) the aforementioned goals faster, do come and join us. We are hiring!