Detecting the Age of Grassland from the Sky

What did meadows look like 20+ years ago?

Matic Lubej
Sentinel Hub Blog
10 min readJun 6, 2023

--

There is great potential in the ever-increasing amount of satellite data, and it is a common objective for many to take advantage of this by seizing the opportunities offered by new data sources.

For example, systematically integrating and incorporating geo-referenced data could greatly benefit the production of official statistics. This produces more detailed results at regional and local levels in areas such as demography, economy, energy, transport, and agriculture.

Statistical offices across Europe are already introducing methods for including information from satellite observations, which notably change the process of obtaining statistical data. These methods offer benefits such as:

  • significantly reducing the amount of time needed to produce estimates of the situation across the country,
  • enabling more comparable/homogeneous environmental variables between countries.

Why Grassland?

Estimating grassland management is an important step in ensuring the health and sustainability of this ecosystem. By taking into account factors such as the age of permanent grassland, mowing intensity, or fertilization, land managers can develop strategies for promoting sustainable land use.

With the help of remote sensing data, it is possible to design sustainable land use practices and identify areas that may be at risk of degradation due to overgrazing or other forms of intensive land use. By gathering and analyzing this information, we can work towards a more sustainable future for grassland ecosystems.

Sketch created in ExcaliDraw.

Why the Age?

The age of grasslands is closely connected to the richness of species that inhabit it. Grasslands that have been undisturbed for a long time are home to more grassland-specific species and less-so to generalists. Many of these specialist species are rare and endangered.

Additionally, the grassland age is related to ecological functions, as they can contribute to carbon-rich soils and thus the demobilization of atmospheric carbon (provided that their vegetation cover is continuous and that their soil has not been influenced — e.g., by ploughing).

Therefore permanent grasslands are in the focus of agricultural subsidy schemes such as the EU Common Agricultural Policy.

Detecting Presence of Grass

Determining the age of a meadow is only possible via an indirect approach, e.g., by detecting grass presence for a given year, and then performing yearly statistics over the same area. We all know what grass looks like; it can be short or tall, wavy and very much affected by the wind, green or yellow-ish, … The list goes on, and it doesn’t make our task any easier the longer it gets.

Grass can be mowed, but it’s still grass. However, changing a meadow to arable land permanently changes its land use, so in such cases a meadow ceases to exist. Or vice-versa, arable land can be abandoned or otherwise converted to a meadow. So, rather than detecting the presence of grass, it is easier to focus on the presence of bare soil.

Bare Soil Detection

In the past, we have already written a blog post on the detection of bare soil using machine learning methods. The goal is to identify all observations where the area is bare — with exposed bare soil as a result of ploughing, or covered with non-photosynthetic vegetation as a consequence of harvest or vegetation drying up on the field.

The event itself, being ploughing or harvest, cannot be detected in satellite imagery, but its consequences are observable. The bare soil marker thus identifies ploughing events by detecting exposed bare soil. Using the bare soil marker we can therefore accurately obtain pixel-level information whether a meadow has remained a meadow or not.

Turning Back the Dial of Time

20+ years is a long time, spanning over operational periods of many satellites. For this specific task, the following instruments were considered:

  • Sentinel 2 — MSI
  • Landsat 8 — OLI
  • Landsat 5 — TM

Here are some of the specifications and operational times of these instruments for a more intuitive overview of what kind of data juggling is expected of us.

Specification of the three instruments used in this study.
Operational time windows of a selected set of satellites.

Downloading Satellite Data Using Sentinel Hub

Sentinel Hub glues satellite images together spatially and temporally, creating a unique and intuitive way to access data for everyone. You only need to specify the area-of-interest (AOI), the time period, and the resolution of the data to download.

Of course, if the area is large, time interval is long, or the resolution is high, you will definitely run into some data management problems. The most straight-forward thing you can do here is to use our python packages eo-learn and eo-grow, which enable you to use existing processing pipelines (or create your custom ones), and apply them over the full AOI in a scalable manner. Not before long and you’re all set to start working with hyper-spectral imagery!

A schematic representation of the scaling-up logic within the eo-grow Python package.

The downloaded data had to be structured in a sensible way, to allow loading the data files efficiently and also enabling convenient filtration to a specific subset.

Difficulties Related to Different Data Sources

What do you do when you have an ML model specific to Sentinel-2 data, and a bucket full of 20+ years of imagery from multiple data sources which the model never observed during the training phase?

… you get creative.

Landsat-8

In case of Landsat-8 we were lucky, since we had ground-truth data for the same period of time where both Sentinel-2 and Landsat-8 were available, so for this one we simply retrained the model using different input data with the same ground-truth information. The model trained on Landsat-8 was slightly worse due to the lower resolution taking its toll. Larger pixels mean more mixing and bringing in information we are not interested in (parcel borders and outside-border areas, artifacts, buildings, …). Image below shows the ROC curve of the two models (Sentinel-2 and Landsat-8) for two cases:

  • with negative buffer applied
  • without negative buffer applied

It is evident that the two evaluation of the models on data where the negative buffer was applied yields better performance, since there was less mixing present in the observed data.

The ROC curves of the two models (Sentinel-2 and Landsat-8) for the case of parcels with (blue and orange) and without (green and red) the negative buffer applied.

Landsat-5

In case of Landsat-5 we were not so lucky with the availability of reference data. Thanks to the EO community we were able to find an article from Roy, David P., et al. on Characterization of Landsat-7 to Landsat-8 reflective wavelength and normalized difference vegetation index continuity. (Remote sensing of Environment 185 (2016): 57–70.).

The article describes harmonizing Landsat TM and ETM+ surface reflectance to Landsat OLI surface reflectance, which was perfect for our use case. This means that we could harmonize Landsat-5 data to Landsat-8 data with a series of linear transformations, and then simply reuse the Landsat-8 model from the previous step.

TOA (left) and surface (right) NDVI comparisons derived from the filtered results shown in Figs. 7 and 8 respectively. Top row: sensor NDVI scatterplots considering NDVI values in the range 0 to 1; the blue lines show ordinary least squares (OLS) regression of the OLI against the ETM+ data, the green lines shows OLS regression of the ETM+ data against the OLI data, and the dotted lines are 1:1 lines superimposed for reference.

Annual Layer Production Process

Annual layers of bare soil presence were produced on the pixel level with the condition that at least two consecutive bare-soil observations were needed to recognize bare soil in an area for a particular year.

The following figure shows the frequency of such a condition in Sentinel-2 across years between 2017 and 2021, outlining that bare soil presence in permanent grasslands is very rare (in less than few percent cases), while arable land triggers such a condition with a significantly higher rate.

Yearly distributions of bare soil observation counts per parcel for Sentinel-2 data. Meadow parcels are shown in green, while arable land parcels in brown. The y-axis is shown in log scale.

The distribution of the number of years without bare-soil presence is shown below for permanent grassland (green) and arable land (brown) for both Sentinel-2 (top) and Landsat-8 (bottom). Only years between 2017 and 2021 were considered for this plot. We can see a strong agreement over the years for permanent grassland in both sources. Differences are present for arable land, but since this is not the focus of this variable, the condition was fine-tuned for permanent grassland to achieve similar performance between different sources.

A histogram representing the number of consecutive years without detected bare soil for meadow parcels (green) and arable land parcels (brown) for a 5 year period. At least two consecutive observations with detections of bare soil were necessary to trigger the detection. The results are shown for Sentinel-2 data.
A histogram representing the number of consecutive years without detected bare soil for meadow parcels (green) and arable land parcels (brown) for a 5 year period. At least two consecutive observations with detections of bare soil were necessary to trigger the detection. The results are shown for Landsat-8 data.

An example of bare soil presence maps is shown below for Sentinel-2 and Landsat-8. At this level one can see the higher frequency of bare soil detections in Sentinel-2 compared to Landsat-8, which is expected, based on the observations mentioned above.

Comparisons between Landsat-5 and Landsat-8 maps lead to a similar conclusion, although a direct comparison could not have been made due to a non-existing overlap between the two sources, the differences seen in the comparison for the two maps (spaced 2 years apart) were minimal.

Area around Maribor, Slovenia. top-left is OSM, top-right is True Color Sentinel-2 L2A imagery, bottom-left are bare soil model results on Landsat-8 data, and bottom-right are the same model results on harmonized Landsat-5 data.

Putting it All Together

In order to aggregate the extracted information into official statistics for a specific year, we needed a mask to define the grassland area-of-interest for that specific year. This was achieved by joining meadow parcels from land use layers and from the land parcel identification system.

Finally, yearly maps were produced for the years 2000 through 2021. From the produced map it is possible to count years with no presence of bare soil from most to least recent to obtain the age of a grassland.

Image below shows the Kočevje region in Slovenia, which is surrounded by a thick forest. In the middle of the image there is a grassland region, for which we are now able to determine the age, spanning from young to old. The range of age values spans from 0 (dark blue) to 21 (yellow).

Area around Kočevje, Slovenia. OSM layers of the area are shown in both images, with the exception of the age of meadows layer overlayed on the right. The age of meadows layer represent the number of consecutive years without detected bare soil, ranging from 0 (dark blue), to 21 (yellow).

In Slovenia, permanent grasslands are defined as grasslands where there was no activity of ploughing for at least 5 years. Based on this information, we can create a map of permanent grasslands in Slovenia, which shows that the majority of grassland (~98%) is considered to be permanent (shown in green, non-permanent shown in red).

Map of permanent and non-permanent grasslands of Slovenia, based on the 5-year marker of bare soil presence. ~98% of the meadow parcels are classified as permanent meadows (shown in green). Non-permanent meadows (shown in red) are not visible at this scale.

NUTS-level Aggregations

The final results are aggregated at various NUTS (Nomenclature of Territorial Units for Statistics) levels of aggregation for the purposes of official statistics. NUTS-3 and NUTS-2 levels of aggregation are shown, with pie-charts for each of the NUTS regions representing the ratios attributed to each age class.

Aggregation of the age of meadow parcels in Slovenia from 2000 per NUTS-3 region.
Aggregation of the age of meadow parcels in Slovenia from 2000 per NUTS-2 region.

Landsat-5 MSS — Gap Year of 2012

Due to a Landsat-7 failure in 2003 there is a gap in 2012 just after Landsat-5 TM was decommissioned and just before Landsat-8 started operating. The only data from that period is from Landsat-5 MSS, where limited acquisitions were made from June 2012 through January 2013. MSS is different from TM, and there don’t seem to be any similar articles on harmonization. Additionally, the resolution of the MSS instrument is 60 m, while for the Landsat-5 TM and Landsat-8 OLI it’s 30 m, as shown above.

In this case it was decided to simply mirror the results of 2011 for the gap year, since it was estimated that the benefit would be too small for the amount of effort one had to put in.

Conclusion

As mentioned in the beginning, geospatial statistics are important for understanding and managing land and resources of our planet. From this perspective, Earth observation data proved to be a valuable tool in improving the quality and reliability of such products.

By using data at the country level and by extending our analysis back in time — sometimes up to 20 years — we were able to gain valuable insights about a major part of our ecosystem. Overall, our findings suggest that the use of Earth observation data can provide important information for decision-making and policy development in the agricultural sector via an improved process of obtaining statistics.

Moving forward, it will be important to continue exploring and developing the use of Earth observation data. Such activities would open potential collaborations and partnerships with other organizations and agencies that could benefit from the proposed solutions and information. By doing so we would enable a better understanding of the needs and an improved expertise within other government agencies, academic institutions and non-profit organizations to share data and knowledge for joint development of new solutions for producing geospatial statistics.

Thanks for reading! Get in touch with us at eoresearch@sinergise.com for any question or comment about our analysis.

The content of this blog post is a result of the GEOS-2021 project from the Statistical Office of the Republic of Slovenia (SURS) (JN430–44/2021), and funded by Eurostat. We thank the partners ZRC-SAZU (lead) and Faculty of Civil and Geodetic Engineering (University of Ljubljana) for an excellent collaboration on the project.

--

--

Matic Lubej
Sentinel Hub Blog

Data Scientist from Slovenia with a Background in Particle Physics.