Combination of Sentinel-2 B08 and Sentinel-1 VV bands over Tewkesbury, Jan/Feb 2021 (🌐 EO Browser).

Exploring Time and Space

A guide to accessing, analysing and visualising data in the Euro Data Cube

William Ray
Published in
10 min readMay 12, 2021

--

Scared of data cubes? Never used them in your work? Then, this is the post for you! This article follows on from Dorothy Rono’s excellent introduction to the Euro Data Cube, but in case you haven’t had a chance to read it, you can read all about it here.

In this post, we go further and demonstrate how you can use the EOxHub workspace and the xcube Python package to build your own cube! You will learn how to configure and build your data cube, visualise the data, create new variables and manipulate your data spatially and temporally. To showcase these functionalities, we will explore river flooding during the winter of 2020/21 on the River Severn in the United Kingdom using Sentinel-1 & 2 imagery obtained from Sentinel Hub. By the end of this post, you should be ready to apply what you have learnt to your own projects and create something really cool.

This post is accompanied by a Jupyter Notebook which can be found here.

Reintroducing the EOxWorkspace

The Jupyter Notebook showcased in this post is run on the hosted cloud EOxHub workspace. This means you have all the necessary computational and storage resources, (library/package) dependencies and credentials already set up and ready to go. This makes it easy to start running your script and be able to concentrate on developing your applications, rather than wasting time setting up your environment! Before you get started working on your own application or use case, you will need to sign up to an EDC account and activate the EOxHub and Sentinel Hub services. These come with a month long free trial so you can test out the functionalities offered.

But why should you use a data cube in the first place? What is wrong with traditional methods of analysis? The answer is in the efficiencies of analysing large spatial and temporal datasets. Using traditional methods, you would need to search, download and store large amounts of data which would be laborious to then query and process. With EDC, the cube is generated “on the fly” meaning you only have to access the data from the cloud when required in your analysis. This makes it far quicker and easier than if you tried to analyse the data using traditional methods.

Area of Interest centred on Tewkesbury, United Kingdom.

Building the data cube

The first thing we need to do is configure and build our data cube. This is really simple to do once you understand the configuration parameters. In the below example, we are building a data cube that uses Sentinel-2 Level 2A bands 2, 3, 4 and 8 (RGB & NIR). We have specified the bounding box for our Area of Interest (AOI) in WGS84 coordinates. Our study area is centred on the River Severn in the UK, which frequently floods in the winter months. The spatial resolution (in degrees, as we are using WGS84), the time range (1st December 2020 – 28th February 2021) and a time tolerance between acquisitions of 30 minutes so we don’t accidentally use any duplicate images in our cube.

Then, it is as easy as opening the cube on the fly and displaying the contents. It’s important to bear in mind that no computational resources are used until you begin any analysis, making it quick and easy to set up your data cube. Looking at the contents of the data cube (in the below figure), as well as latitude and longitude coordinates we are all familiar with, we also have time coordinates too. We can also see that the 4 bands we called earlier can be found under data variables.

The contents of the data cube we just configured and built.

Create a new variable & visualise it

Most of us will turn to multispectral sensors such as Sentinel-2 to visualise and analyse a flood event. We can interpret this data intuitively as the True Colour represents colours that the human eye is familiar with and the Normalised Difference Water Index (NDWI) can be used to highlight areas of flood water. When the conditions are clear, this method and approach works well. Let’s go ahead and create the new variable for NDWI:

To calculate the new variable we are using two existing variables defined as s2_cube.B03 and s2_cube.B08. We insert these variables into the formula for NDWI. Once ndwi has been defined, it’s attributed a long_name and units before being defined as ndwi so that we can call it as a definition later in the notebook.

Flooding is commonly associated with high rainfall events which almost certainly means that there will be cloud cover for the AOI you are interested in. So while you might find a cloud free image for a single date, over a time period in the winter months, it’s unlikely you will have a time series that is cloud-free enough to be useful.

True Color and NDWI visualisations of a Sentinel-2 acquisition from 27th December 2020 centred on Tewkesbury, United Kingdom.

Sentinel-1 SAR Imagery

So how to solve this cloud problem? This is where Sentinel-1 Synthetic-Aperture-Radar (SAR) imagery comes into its own! Collecting data in this part of the spectrum means that there is no interference from cloud cover and meaning that we can generate a full time series without gaps.

So let’s leave our comfort zones and throw away our Sentinel-2 data cube and create a new data cube using Sentinel-1. Don’t worry, I promise it won’t be too scary!

Conversion of DN to dB using xcube functions

So let’s see what we can find out. Firstly, we need to configure a new cube, this time, using Sentinel-1 imagery and call the VV & VH polarisations (you can see how this is done in Cell 11 in the Jupyter Notebook). Next, we need to create a new variable in the Sentinel-1 data cube. This is extremely easy to do and all you need is to define your new variable with the respective formula, attribute it a long name, units and then set this as a new variable in your code.

The above example does just this to convert DN to dB for the VV polarisation band. We calculate this in the 2nd line and then in the 4th line of code we reassign any no data values back to zero. In line 6 and 7, we attribute some metadata to our new variable and lastly, we assign the newly created variable as a data cube variable to call later in the script.

Let’s now visualise this which can be done really easily using Matplotlib and see whether using VV as a substitute for NDWI is feasible:

VV dB polarisation over the AOI, 28th January 2021.

This looks promising! Like the NDWI acquired from the day before the flood extent seems well defined in the spectral response from dry land. The next step from this is to experiment with some thresholding. Again this is fairly simple to do, although we are going to use some functions that might take a bit of time to wrap your head around! Luckily for you, we’ve gone through most of the head spinning phase for you!

Create a new variable using a threshold

For the next example, we want to create a flood mask using a threshold from the VV_dB variable we just generated. In this instance, flooded pixels would = 1 and non-flooded pixels would = 0.

To do this we need to go through two steps: firstly, we allocate pixels smaller or equal than -20 dB a value of 1 (preserving the values of the other pixels). Secondly, we use another where statement that allocates all pixels not equalling 1 a value of 0. Here’s how we do that in the notebook:

At first glance, this may not make sense, at least for me, since you may read that the step 1 function is assigning a value of 1 to pixels in VV_dB that are equal or more than -20. However, what is actually happening is that the .where function preserves all the pixel values in the variable that are below -20 and assigns everything else a value of 1.

Don’t forget, like previously, we give our new value a long name and units and then define the new variable for further use in the notebook.

Comparison of flood masks derived from different threshold settings.

From the above, it appears that a threshold of -20 dB is about right. It appears there are many more false positives with -15 dB and -25 dB omits many pixels that look to be flooded.

Visualise a spatial subset of a variable over time

Another advantage of the data cube is that we can analyse either pixels or areas of interest over time as well as space. Because we have access to every variable for every time step we can quickly and easily generate time series to examine phenomena like floods over time.

To do this we just need to query the correct dimension and can then plot this:

Time series showing VV dB over a field by the River Severn 1st December 2020 to 28th February 2021.

The results of this are arguably even more informative than the previous maps of flooding for individual time steps that we generated. Using the AOI of a field next to the River Severn, we can clearly see the length of each of the flood events; the first being about a week, while the February floods were more prolonged, lasting nearly 2 weeks before they receded. Without using a data cube, the temporal variation between the two flood events would be less clear.

Create a new variable based upon space and time

Let’s go back to the spatial data visualisations though! After all, satellite imagery is spatial! For our last piece of analysis we are going to collate all our data into a single array which will show the proportion of observations in time each pixel was flooded according to the threshold we are using. This new variable will analyse data spatially and temporally, a perfect example of why you would want to use a data cube in your workflow. The following example would take way too long to achieve using traditional methods.

We need to generate two numbers; the count of the number of observations (time-steps) and the sum of all the flood mask time-step rasters . Therefore, if a pixel was flooded in 25 observations and there were 50 observations in total then the proportion of observations when the pixel was flooded would be 0.5. Let’s see how this is calculated using the data cube:

Okay let’s visualise the flood_average variable. This time we don’t need to select a time step as this variable doesn’t have a time coordinate associated with it.

flood_average between 01–12–2020 and 28–02–2021

The result of this calculation is really powerful, enabling us to clearly see the flood plain of the River Severn. We can take this even further and calculate the same for previous winters and compare the severity of flooding on an annual basis. Sometimes, it is better to extract variables from your notebook and examine them in GIS applications where you can analyse them further and produce some more advanced visualisations.

Export a Variable

Exporting a variable is very simple, and we will show two examples here; exporting the thresholded flood extent for a single time-step, and exporting the flood_average variable we generated in the last section.

Yes, it really is only three lines of code to extract the variable! Firstly, define the time-step we want to export. Secondly, we write a coordinate system to the dataset (WGS84) and then using an inbuilt rasterio method we save the georeferenced data to a GeoTiff, ready to be imported into the GIS software of your choice! We can repeat this again below for the flood_average variable.

You can then import this into QGIS and perform some further analysis or just make a neat little map like the one below!

Flood average on the River Severn, Tewkesbury between 1st December 2020 and 28th February 2021.

Summary

So what have we learned through this post and accompanying notebook? We have learnt how to…

  • Create a data cube filled with satellite imagery.
  • Calculate and create a new variable.
  • Generate time series plots.
  • Calculate advanced variables using time and space.
  • Export our variables from the data cube to GeoTiff.

I hope you have found this post interesting and useful in helping you make your first steps into using data cubes in your own work and workflows! Hopefully, I’ve also been able to make them a little less scary and easier to approach. I look forward to seeing you apply the lessons from here to your own applications!

Further Information

Visit Euro Data Cube to find out about our subscription plans and take advantage of the free trial options. Additionally, the Network of Resources (NoR) initiated by ESA provides sponsored access to some of our services to qualifying researchers and entrepreneurs. Follow these step-by-step instructions to apply for sponsorship.

For more inspiration on the endless capabilities for innovation in EO, the notebooks section of the marketplace has a diverse collection of practical use cases and tutorials. If you run into trouble using any of our services and need support, or have great ideas to share, you are welcome to contact us or post questions on the Euro Data Cube forum. Follow us on Twitter and Linkedin to stay informed on our latest developments.

One more thing…

If you’ve been inspired to start working on your own datacubes right away, why not check out the latest Sentinel Hub Custom Script Contest and create your own application and solution using the EDC!

--

--