Dynamic Time Warping for Satellite Image Time Series Classification

A tale of pastoralism, land degradation and data scarcity

William Ouellette
SoilWatch

--

Spoiler: Data scarcity is remedied, land degradation less so.

Dynamic Time Warping (DTW) is a little known approach in (temporal) image processing, and even less so in Earth Observation. Initially trialed by Francois Petitjean (to whom I wish the best of luck in his new government career!), DTW is amazingly suited to solving (multi-dimensional) time series classification problems. Here is a didactic example of a real-world Earth Observation (EO) application.

Did someone say TIME WARPING?!

My girlfriend often asks me what I am working on at that particular moment, and I very often have no interesting answer to provide to her… Until I started working on… DYNAMIC TIME WARPING!!!!

The need came from constantly working with poor (reference) data, too little data, or a combination of both. I started exploring whether there was a classification algorithm for satellite image time series that did not need a gazillion of training examples to perform reasonably well, in opposition to the state-of-the-art, data hungry approaches, e.g. random forest pixel-based classification or deep neural networks for semantic segmentation. It turns out there was, and that research in the last 10 years was substantial and showed promising results, yet the adoption and operationalization of the method seemed to be slow.

In a nutshell, DTW allows comparing the similarity of pairs of temporal patterns of heterogeneous lengths and/or sampling intervals. More commonly applied in speech/sound recognition or shape matching in video streams, applying it to satellite image time series is a pretty terrific idea, especially if you consider that sequences of satellite images essentially ARE video streams, just very boring ones, unless you bundle and play them at an outrageously high frame rate.

My favorite time-lapse ever made from satellite imagery, drawing out the analemma pattern in the sky. Credits go to: Matic Lubej

The advantages of DTW over more traditional machine learning classification approaches can be summarized in the following:

  • Loose requirements for reference data: We all love those algorithms that can gobble truckloads of examples and classify an image to perfection, but at the end of the day, how often do you have the luxury of quality reference data abundance? Not very often… Instead, you are usually a poor, data(-starved) scientist scrambling to gather bits and pieces of low quality reference data. For those situations, DTW may turn out to be a judicious choice as it requires very few reference samples. Unlike random forest or other machine learning techniques that are likely to overfit on a limited set of (low quality) data points, DTW can produce amazingly accurate classification maps with as little as 1 sample point per class (you heard that right!). This would theoretically work if you are able to ensure inter-class separability through a unique sample point per class, i.e. that the input sensor data used can effectively discriminate between the defined classes and the landscapes you are dealing with are quite homogeneous. In practice, you probably still want to provide 5 or more examples for each class to better characterize intra-class variability, but we are still talking about extremely small data volumes. Moreover, as a result of the low level of data fitting with DTW, using reference data from past years is possible and yields pretty decent results.
  • Matching heterogeneous time series: Time series of varying lengths and sampling intervals are the norm in multi-temporal satellite imagery processing. Temporally aggregating time series of images to fewer composite products corresponding to a given temporal interval (e.g. 15 or 30 days) is common to harmonize time series for methods that require it, which is most machine learning classification techniques. On top of time series harmonization, temporal aggregation wields the benefit of data size reduction, noise reduction, and gap-filling (e.g. cloud gaps in optical data). However, even if you harmonize your image time series, there is no guarantee the available reference data is sampled at corresponding time intervals, unless the reference data pattern is interpolated to match the time series sampling, which would create synthesized values that may not correspond to reality. Instead, DTW can directly handle the mismatch between the pattern length and the image time series length!
  • Handling of pseudo-periodic phenomena: When talking about vegetation, the modulations of phenological cycles result in distortions of their canonical temporal profiles, both in space and time. In time when dealing with inter-year climatic variations of vegetation response, and in space when dealing with the delayed start of the growing season along a north-south gradient, or with varying altitude. Allow me to paraphrase Petitjean et al., 2012 who describes this issue as the intrinsic temporal behavior of the observed object, as opposed to the temporal irregularity resulting from the sensor’s frequency of revisit (and cloud presence/absence for optical payloads), described in the previous bullet.
Illustration of pattern matching using DTW as opposed to the rigid Euclidian distance. Taken from Databricks’ excellent explanation of how DTW works

The Constraint

The advantages of DTW are clear, especially in a data-scarce context where flexibility is required to manage heterogeneity of the data sources available. However, too much flexibility in the pattern matching behavior can lead to poor performance. Indeed, by comparing the full temporal range when finding the best alignment between two time series, DTW excludes contextual knowledge about vegetation seasonality and phenology, whereas it could be integrated as knowledge to the system.

Remember that phenological cycles are pseudo-periodic phenomena, and that across regions, they may experience temporal shifts due to canonical growing periods, e.g. as determined by region-specific crop calendars for cropped vegetation. Moreover, within the same region, a seasonal shift may occur due to changing climatological factors from year to year, further complexifying the spatio-temporal modulations of those temporal patterns.

In order to deal with these modulations, at least two (documented) options are available to us:

  • A “hard” time constraint: This essentially consists of restricting the possible pair-wise dissimilarity calculations to only the points that are within a certain time period of each other (typically expressed in days), defined as w. Technically, this reduces the number of computations, and therefore the processing time. However, with state-of-the-art DTW implementations available (Google Earth Engine implementation, or the fastdtw python package), computational efficiency is no longer considered an issue. Moreover, putting a “hard” time constraint requires a very good knowledge of the temporal behavior of the surfaces observed, e.g. the crop calendar in the case of crop type classification. As previously mentioned, spatio-temporal variations of phenologies in a changing climate are very difficult to accurately characterize, and a hard time constraint assumption may be too restrictive.
The alignment between two sequences is computed only for the yellow cells of the matrix, reducing the number of computations necessary. Taken from Csillik et al., 2018
  • A “soft” time weight: The time weight does not narrow down the computation window in the dissimilarity matrix, but rather assigns a weight to each alignment computation. A logistic distribution of this weight is more akin to the time constraint above, where a small temporal cost is assigned for the initial n days, with a sharp increase in the temporal cost after a certain time period (approx. 70 days in the below example). The linear weight does not mimic well the behavior of natural phenomena like vegetation or crop growth, and is ill-suited for the case of land cover and crop type classification. Therefore, we will prefer a logistic weight for environmental monitoring applications like crop type or land cover classification.
Linear and logistic time-weight. The logistic weight in this case has midpoint β = 100 days and steepness α = 0.1. Taken from Maus et al., 2016
  • The distance calculation method: As alternatives to the simple Euclidean distance calculation between pairs of points applied in the time-constrained and time-weighted cases, there are cases for using alternative distance metrics, such as the the vector distance from Teke et al., 2019, where the angular distance is used instead of the Euclidean distance. This change in distance calculation and time weight/constraint are not mutually exclusive, and any combination of these can be technically implemented.

A Land Cover Classification Use Case for Sudan

To provide a bit of concreteness to this theoretical blabber, we are applying DTW with SoilWatch as part of an effort to design, plan and undertake climate-financed forest and landscape restoration at scale. Bear with me for this context intermezzo, where I try to provide the why and the how.

In the Sudanese context, an institutional barrier effectively prevented restoration efforts. Rainfed cropland and rangelands are intertwined in a complex mosaic of land tenure, where legislations allow the expansion of rainfed cropland over unclaimed land. The unregistered land act of 1970, combined with the abolition of a native administration in 1971 provided the state the right to control communal land. The idea behind the act was to provide the government with a tool to facilitate the acquisition of large tracts of land for mechanized agricultural schemes as the promise of being the food basket of the Middle East. After exhausting all agricultural land deemed agro-climatologically suitable, mechanized agriculture schemes have largely encroached in semi-arid regions, at the expanse of the more climate-adapted rangelands.

Land abandonment and long-term fallowing are common practices after a few years of low yield harvesting due to poor economic viability of growing crops in semi-arid areas, or as the result of pre-emptive land clearing with the goal to grow crops in the future. On top of the soil impoverishment resulting from rangeland to cropland conversion, fields or watering holes may end up being fenced off. Abandoned or fallowed cropland, which could be suitable for rangeland restoration, remain off-limit to pastoralists and their herds, who are squeezed into smaller and smaller areas dedicated for grazing. The increased length of grazing period on a smaller acreage leads to over-grazing, in turn leading to soil erosion and rangeland degradation. Unfortunately, there is little to no spatially-explicit, quantitative information about the extent and distribution of these phenomena. Generating this data, starting with a land use/cover map mapping the extent of rangelands and abandoned cropland, could be the basis for a debate to overturn outdated land tenure legislations and create “fertile ground” for the design and planning of large-scale restoration projects in the most stricken areas of Sudan, like Darfur and Butana.

No longer good for annual crops, no longer good for perennial grasses: Mechanized Agriculture in the Butana Communal Rangelands have turned the area into a desolate wasteland. Courtesy of: SoilWatch

In the case of Darfur, rangeland to cropland conversion is expedited by overgrazing rather by the economic opportunity of mechanized agriculture expansion over (productive) rangelands. When the range is seriously degraded, it no longer produces fodder, and it is then converted to (very low yielding) sorghum production for a few years. In both cases, the outcome is the same, a trail of barren wasteland in areas which once were extensive tree Savannahs (Mistry & Beradi, 2014).

Degraded rangelands in Darfur being ploughed for a few years of opportunistic sorghum farming, destroying the little that is left of the perennial grasses’ root systems holding the soil together, resulting in barren land void of any life forms. Courtesy of: SoilWatch

Now, you may wonder what this has to do with DTW? Sudan has come out of a tumultuous period during which access to the country was not facilitated. As a result, land tenure, amongst other socio-economic and environmental characteristics of rural areas, has been little studied. While this may mean lack of information about pastoralist routes for an anthropologist, this means low availability of land cover/land use maps and reference data for a geospatial data scientist.

Let’s run a quick check:

  • Are we in a situation of data scarcity? ✔️
  • Are we in a situation of low contextual/local knowledge availability? ✔️

A perfect match for DTW, so let’s give it a shot to produce a land cover map for the Sennar District, an area known for its diminishing rangelands at the expense of cropland. Identifying rangeland extent, as well as the rate of active vs abandoned croplands, year after year would be a tremendously valuable piece of evidence to convince policymakers to review public policy in place with the aim of supporting more sustainable land use in semi-arid areas.

Bringing the algorithm to the data

I wanted the approach to be scalable, so decided to bring DTW closer to the data by implementing it in Google Earth Engine (GEE). After some years of the algorithm being requested on the GEE developer forum, GEE users will be delighted to find the code here. It allows multi-dimensional inputs, meaning that multiple spectral bands or indices can be provided across multiple sensor types. In this example, I have used 7 bands: B2, B3, B11, B12, NDVI from Sentinel-2, and VV, VH from Sentinel-1. The former provide critical phenological information, while the latter provide structural information about the surfaces observed. They are combined through a simple Euclidean distance, as in Csillik et al., 2018:

For combining multi-band DTW outputs, the Euclidean distance is used to compare the elements of time series where aif and bjf are elements of the feature f of multi-dimensional sequences A and B, with F number of features.

A total of 41 reference data points were collected through photo-interpretation of recent Sentinel-2 imagery and high-resolution orthophotos, with the guidance of data layers from the Sudanese Geospatial Data Portal. Those reference points were collected across the following classes:

  • Built-up.
  • Water.
  • Trees, combining both natural and planted.
  • Wetland.
  • Bare Soil. This may be rocky outcrops, or degraded land to the extent where nothing grows back (see picture below).
  • Rangelands. They cannot be effectively separated from abandoned or long-term fallowed land based on a single-year DTW output, but can be differentiated by using previous years’ outputs to identify a crop phenology prior to fallowing.
  • Active Cropland in 2020, whether rainfed or irrigated.
  • Abandoned/long-term fallowed cropland in 2020. A field is considered abandoned if fallowed for more than 3 years in a row, using multi-temporal land cover output.
  • Short-term fallow cropland in 2020. Fallowed for less than 3 years in a row, using multi-temporal land cover output.
Serious soil erosion in West Darfur, where opportunistic grasses grow around bare soil patches. Courtesy of: SoilWatch

The time series covered the period of July until December (30 days temporally-aggregated composites, 1 for each month), as the majority of the photo-synthetically active period follows the arrival of the rainy season in July, and using data prior to this period would not greatly improve the discriminative power of the algorithm, while increasing computational time. Moreover, a time-weighted DTW approach was adopted, with β = 50 days and steepness α = 0.1.

A crazy kaleidoscope of phenologies between July and December (one frame for each year from 2017 to 2020) in the Sennar District of Sudan. The false RGB composites was produced with the following Sentinel-2 bands: Red: December NDVI, G: October NDVI, B: August NDVI
The all-class combined dissimilarity score, ranging from 0 (green) to 20000 (red), is helpful to understand areas that are poorly represented in the reference dataset. This can guide the collection of new samples in those given locations, or the filtering of the land cover map using a dissimilarity score threshold to mask areas of low classification certainty.

The dissimilarity score is also produced as an additional layer by the algorithm, a great way to identify areas un(der-)represented in the reference data. The red areas likely correspond to a type of crop which is not captured in the reference data I collected through photo-interpretation. In this particular case of low dissimilarity score, DTW still labels those areas as cropland, as the pattern still best aligns with the crop reference signatures. Using a threshold on the score, one could abstain from classifying areas of low confidence.

I leveraged the availability of Sentinel-1 and -2 data going back to 2016 to produce a land cover map for every year between 2017 and 2020, using the reference land cover patterns extracted for 2020. DTW is surprisingly robust when performing cross-year classification (Teke et al., 2019, Dadi, 2019). In order to dissociate actual rangelands from short-term and long-term fallowed croplands, temporal information is necessary to know how many years in a row an area considered to be cropland has been fallowed. Typically, land fallowed for more than 3 consecutive years is considered “free” by the Sudanese government, meaning it could theoretically be returned to rangelands if no use is made of that land. Farmers are aware of this ruling, and they may artificially clear their field to secure the fields for potential future use, even though they are currently not cultivating them (IUCN, 2006). Our analysis should be able to identify the croplands which are absent of any form of crop phenology during the growing season for more than 3 consecutive years, and label them as “abandoned”.

The land cover classification using the Time-Weighted DTW approach. Most of rangelands and disputed croplands (green: rangelands, orange: short-term fallowed croplands and turquoise: long-term fallowed cropland) are located in the north and east of Sennar District. Our approach provides fine-grained (10m resolution, the native resolution of Sentinel-2) information about the distribution and status of rangelands and cropland across the area.
Land Cover Class Distribution in the Sennar District

Even though we have limited information about the progressive conversion of rangelands to cropland (besides a land cover map of unknown year from the Sennar Information Centre), the first thing we notice is the overwhelming prevalence of active cropland. Large parts of Sennar are suitable for rainfed agriculture; the marginal areas in the north and the east where rangelands are found are areas where the agro-pastoral tensions are high. We also notice an almost total conversion of forest to cropland (another issue central to landscape restoration in Sudan) in the south-east, and a drastic reduction in rangelands extent in the far east of the district.

Historical land use map for Sennar, as provided by Sennar Information Centre. Unfortunately, I was not able to find out the date of map production.

No formal validation was carried out, besides exploring the output visually through photo-interpretation of recent imagery, and cross-referencing the information using ancillary datasets (such as the map above). A field-collected dataset is in the pipeline to carry out formal validation of our analysis, and provide some legitimacy to our results for maximal policy impact.

Conclusion

Dynamic Time Warping applied with a time weight is an incredibly flexible approach to satellite image time series classification, due to its ability to perform in data-scarce conditions — which, let’s face it, is often the case — , as well as its ability to deal with heterogeneous data sources across years.

The low reference data volumes required makes it a highly operational mapping approach, where the reference data, as well as the classification outputs, can easily be moderated, curated and iterated upon to improve the results’ accuracy and fitness-for-purpose.

We were able to provide an up-to-date overview of the state of rangelands and cropland in Sennar with hardly any reference data, and provide spatially-explicit and quantitative information about the documented causes of exacerbated environmental degradation in the semi-arid regions of Sudan.

I encourage you try to out my DTW implementation in GEE and provide feedback/improvements!

You can reproduce and/or visualize the datasets introduced in this post using this link.

Background information on Dynamic Time Warping:

  • Petit-Jean et al. 2012’s first paper on applying DTW to satellite image time series
  • Maus et al. 2016’s paper on time-weighted DTW applied to land cover classification
  • Ovidiu Csillik & Mariana Belgiu’s research on applying dynamic time warping to crop type classification:

Background on the state of land tenure and land grabbing in Sudan:

  • An excellent overview by IUCN of Pastoralism and Conservation in the Sudan
  • Hussein M. Sulieman’s research in Gedaref state:

Other Credits:

  • SoilWatch and team with boots on the ground for providing valuable contextual information and photos.
  • My consultancy work at FAO for TCP/AFG/3706 and lorenzo de simone have contributed a great deal in shaping the implementation of DTW in GEE, where it is applied in the context of crop type mapping.

Like this content? → Follow us on Twitter for more

--

--

William Ouellette
SoilWatch

Regenerating degraded 🌍 and eroded soils 🌱, one pixel at a time🛰️