Area Monitoring — Data Handling
How not to get overrun with vast volume of available data
This post is one of the series of blogs related to our work in Area Monitoring. We have decided to openly share our knowledge on this subject as we believe that discussion and comparison of approaches are required among all the groups involved in it. We would welcome any kind of feedback, ideas and lessons learned. For those willing to do it publicly, we are happy to host them at this place.
- High-Level Concept
- Data Handling (this post)
- Outlier detection
- Identifying built-up areas
- Similarity Score
- Bare Soil Marker
- Mowing Marker
- Pixel-level Mowing Marker
- Crop Type Marker
- Homogeneity Marker
- Parcel Boundary Detection
- Land Cover Classification (still to come)
- Minimum Agriculture Activity (still to come)
- Combining the Markers into Decisions
- The Challenge of Small Parcels
- The value of super resolution — real world use case
- Traffic Light System
- Expert Judgement Application
- Agricultural Activity Throughout the Year
Typical farming parcels are not large and contain a few to a few tens of Sentinel-2 pixels or very rarely in Western Europe a parcel is so large that it contains a few hundreds of pixels.However, we believe that there are more data in the temporal rather than spatial domain of Sentinel-2 imagery. We, therefore, average Sentinel-2 reflectances over Feature Of Interest (FOI) and develop models suitable for time-series data. By doing the aggregation over FOI’s geometry we lose all spatial details provided by the Sentinel-2 but keep all temporal data about the evolution vegetation growing on the FOI.
The use of spatially-aggregated Sentinel time-series data for crop type mapping has been used previously, for example in Sen4Cap’s Crop Type Mapping System. Its workflow is illustrated in the Figure below (image taken from Sen4Cap’s report), where red circles highlight the step in which aggregation over pixels is performed and time-series are produced.
In Slovenia, farmers data from their Geospatial Aid Application (GSAA) dataset is used to retrieve information about parcel boundaries, their attributes (e.g. crop type), and measures, providing a sample of over around 800 000 FOIs for each year.
Earth Observation data
Sentinel-2 top-of-atmosphere (TOA) L1C reflectances are calculated only from those Sentinel-2 pixels that are completely within FOIs boundaries (non border pixels). using Sentinel Hub’s feature info service (FIS), using all Sentinel-2 observations with cloud coverage below 70% (cloud coverage estimated over Sentinel-2 tile) between January 1 and October 31 of each sample year, along with the AOI-based computed cloud mask. The reflectances are then converted to vegetation indices, which are statistically-summarised (obtaining mean, standard deviation, minimum and maximum) per FOI. The figure below shows a false-colour visualisation from four different Sentinel-2 observations for a typical FOI with corn. The FOI’s boundaries are shown in green and the boundary of all non-border Sentinel-2 pixels within it is shown in white.
Time-series of mean Normalized Difference Vegetation Index (NDVI) obtained using Sentinel Hub’s FIS request for the same FOI is shown as a blue dash-dotted line in the figure below. Sudden drops of NDVI are due to cloudy observations, which were identified and filtered out with Sentinel Hub’s s2cloudless cloud masking algorithm. The green dots indicate all remaining valid observations. All current markers, including the crop type marker, work on time-series data like the one illustrated here.
Initial focus is on the optical data (Sentinel-2) because of its resolution and simplicity, both in terms of understanding as well as technological processing. That being said, it is clear that SAR data (Sentinel-1) are useful as well, in some cases even more so due to temporal consistency. Past research has shown that SAR’s sensitivity to soil roughness is especially useful for detecting bare soil and mowing events, and future investigations on the use of Sentinel-1 will also be reported here.
As well its indirect nature, Earth Observation (EO) data provides other challenges to the conventional analytical methods. For example, the sampling frequency is not uniform once cloudy observations are removed, and the number of valid observations can vary drastically between regions and time. The figure below shows a number of all available observations for the three studied years (see how you can check stats in your area). The most observations are obtained in 2018, and the least in 2017, with the number of 2019 somewhere in between. Although much EO literature advises a linear temporal interpolation on a regular temporal grid to unify all FOIs in terms of the number of features (observations) this would not provide the temporal sensitivity required for crop type markers, for example.
As mentioned previously, the main differentiator between different crop types is the temporal evolution of its vegetation indices. However, even time-series of the same crop type can differ from one year to another in space and time, depending on dates of crop emergence, maturity, or harvest, which are heavily dependent on local factors such as weather and environmental conditions (e.g. sunlight, warmth, water availability, etc). The figure below shows the average NDVI profiles of corn FOIs from the three studied years.
This intrinsic variability shows the challenging nature of constructing and applying generalized and reliable crop type models.