Soil Erosion Watch — A Bootstrapped Approach to Identify the World’s Degrading Soils

A deep dive into the underbelly of Earth Observation (EO) environmental monitoring applications: Soils. And why they matter in the fight against climate change

Published in

SoilWatch

14 min readMay 17, 2021

The advent of cloud-backed EO data processing has revolutionized how satellite data can be integrated into services and applications. SoilWatch identified a problem in how the establishment of baselines and the carrying out of monitoring, reporting and verification (MRV) activities were done in environmental restoration efforts.

A financial tool to carry out environmental restoration projects that specifically aim at sequestering carbon is the Voluntary Carbon Market (VCM), which is both a problem and a solution in the fight against environmental degradation and climate change. The VCM is riddled with (fixable) issues, as discussed in this post. The one I want to dwell on is the fact that the availability of spatio-temporal, quantitative information on the climate impact and effectiveness of Nature-based Climate Solutions is cruelly lacking. Global and regional climatological and agro-ecological studies profuse, but there is little information to be found to support appropriate baseline assessment or to monitor land management practices at project-level. This requires high-resolution global environmental data, which today only comes in the form of open-source satellite imagery and its derived products.

In particular, I want to rave about the availability of Copernicus and NASA satellite missions such as Sentinel-2 and Landsat (amongst many others!), which offer the opportunity to fix (at least in part) the VCM by making rigorous and continuous monitoring of Nature-based Climate Solutions not only a possibility, but a necessity, in order to ensure positive environmental externality as well as guaranteed additionality for all carbon-sequestering projects.

As a token of our commitment to providing more actionable data for the VCM, we developed an App using the Google Earth Engine (GEE), aimed at providing an estimation of annual soil loss rate for any given sub-administration in the world. You heard that right!

Continue reading to know how we did it, and how to access the data. If your fingers are jittering to try it before reading about the methodology applied, you can directly access the GUI here.

Identifying Bare Soil in Space and Time

First of all, we are interested in land that is bare at least once in a given time period. In the case of agricultural areas, this translates into areas under tilling or bare fallow, and for other land uses, an area where vegetation only grows very sparsely. The assumption is that surfaces that are always covered by living (and to some extent, dead) biomass are much less prone to erosion, so we want to draw attention to those (intermittent) bare surfaces.

The Geospatial Soil Sensing System (GEOS3), a one size-fits-all approach to extract the world’s bare surfaces from Demattê et al., 2020, was used. We used Sentinel-2 imagery as input, with s2cloudless cloud masking, and @jstnbraaten’s suggestion to mask cloud shadows on top of that. This does not completely eradicate the problem of clouds, but is good enough, as long as you don’t venture to equatorial rainforests, where occurrence of bare soil is non-existent anyway!

I lied. Bare soil can be observed occasionally in the middle of the rainforest, in the form of mineral-rich clearings, also referred to as “Bais”, and which, contrary to bare soil in farmland and rangeland, has tremendous ecosystem value (read here why). This one was spotted in the Sangha-Mbaéré district of the Central African Republic. Not even the Google satellite mosaics generated from years of data manage to be fully cloud-free in that part of the World…

Bare soil Synthetic Composite using GEOS3 for the year 2020 in the Nakuru County of Kenya. These are the pixels we are interested in, and no cloudy pixels in sight ☀️😎

Modelling Soil Erosion Hazard

Tackling soil erosion modelling at the global scale in a scientifically rigorous way is challenging to say the least, but with the advent of high temporal frequency, open EO data like Sentinel-2, empirical modelling of such phenomena have become a whole lot more reliable. The widely used Universal Soil Loss Equation, initially introduced by Wischmeier & Smith, 1978, looks like this:

However, the relooking of the equation by Karydas & Panagos, 2018 helps its interpretation:

With the erosion-inducing factors as numerator factors, and the erosion-controlling factors as denominator factors. Here’s what they correspond to (if the maths do not bore you):

R — Rainfall erosivity factor (MJ.mm.ha-1.h-1.yr-1)

Rainfall erosivity accounts for the combined effect of rainfall duration, magnitude and intensity, as well as taking into account the frequency of erosive events over a longer time period. The global rainfall erosivity dataset produced by Panagos et al., 2017 was used for this purpose, and can be downloaded on the ESDAC portal.

K — Soil erodibility factor (t.ha.h.MJ-1.mm−1)

The K-factor expresses the susceptibility of a soil to erode, is related to soil properties such as organic matter content, soil texture, soil structure and permeability. The global equation for the soil erodibility factor was taken from Renard et al., 1997:

With M being:

To derive the various parameters, we expanded on Panagos et al., 2014’s European Soil Erodibility methodology, but using the plethora of recent open global soil datasets. SoilGrids and the Africa Soil and Agronomy Datacube datasets were used for the sand Mvfs, silt Msilt, clay mc and organic matter OM (using a 1.72 factor to convert soil organic carbon to soil organic matter, as predicated by SoilQuality Australia) contents. The permeability classes p were derived from the saturated hydraulic conductivity dataset from HiHydroSoil v2.0. As a result of this exercise, the data is now also referenced on awesome-gee-community-datasets. Finally, the soil structure classes s were equated to varying conditions of packing density and soil texture classes, as predicated by Jones et al. 2003.

LS — Slope length (L) and steepness factor (S) (dimensionless)

The combined LS-factor describes the effect of topography on soil erosion. We applied Panagos et al. 2015's approach to derive it:

θ: gradient of the slope in degrees

β: ratio of rill to inter-rill erosion

Rills represent the first step of water erosion on slopes, and can eventually lead to more severe erosion in the form of gullies

Ai,j−in: Contributing area at the inlet of the grid cell (i,j) in m²

D: Grid cell size in m (30m in our case).

Slope length is far from trivial to retrieve, especially if not working at watershed or catchment level. Luckily, Dai Yamazaki has provided the MERIT Hydro global hydrography datasets through Google Earth Engine, from which we can use the upstream drainage area information to approximate the upslope contributing area Ai,j−in. MERIT Hydro is available at 90m resolution, so we up-sampled it to 30m to match the resolution of the other factors derived from the latest ALOS global DSM dataset, v3.2. This is also necessary because estimating slope length using a 90m resolution DEM would lead to overestimation of the topography effect of soil erosion. A smaller grid cell size means a smaller per-cell contributing area, which is more realistic in the case of soil erosion flow. Indeed, the upslope contributing area in soil loss modelling is typically capped at around 4000 m², which is half that of a 90m grid cell (8100 m²). Up-sampling to 30m brings a single grid cell contribution to 900m², ensuring the slope effect is not over-estimated.

The three above factors R, K and LS are characterized by :

The covariates they are derived from are of relatively low resolution, ranging from 30m (Digital Elevation Model) to 1km (Rainfall Erosivity).
Their temporal variability is limited to long-term trends (multi-year). So unless, through some divine powers, we would be able to control global rainfall patterns in clacks of fingers (R), or flip the full soil column upside down (K), or even move mountains the literal sense (LS), the estimates of these factors should hold for a few years. In fact, the R-factor used is essentially based on rainfall records from the 2000–2010 period, and the K-factor on covariates representative of the year 2017 at best. It isn’t ideal, but is the best available to derive this global baseline estimation.

The mountains that Mr Montague moves are metaphorical, and does not have an impact on the LS-factor estimation

Therefore, soil loss on shorter time scales is mostly governed by the remaining two factors V and L. I don’t know what else could model this at intra-annual scale and at high enough resolution to allow field-scale information retrieval, other than Sentinel-2 time series!

V — Vegetation factor (dimensionless)

The V-factor, which is a spin-off of the C-factor in the original RUSLE equation, is a more logical way of looking at the (positive) impact of cover vegetation in mitigating soil erosion. Indeed, while the previous 3 factors were erosion-inducing, this one is erosion-controlling.

Most of the land management practices talked about under the umbrella of regenerative agriculture aim at retaining vegetation on top of the soil, namely through cover cropping, crop residue, no/reduced-till and agroforestry. Cover vegetation (above ground biomass) and its root system (below ground biomass) protect the soil from water and wind erosion in incomparable ways, and therefore determining whether and how much vegetation is present on top of cropland and grasslands over time is a great way to estimate this factor, which we did through the following equation:

BSf: Bare soil Frequency in the range [0,1], calculated by the number of bare soil satellite observations, divided by the total number of cloud free observations in the monitoring period. 1-BSf is by deduction the vegetation cover frequency.

FCoverm: Monthly composites of fraction of green vegetation cover, calculated based on the SNAP methodology, and kindly contributed by Kristof Van Tricht in the SentinelHub custom scripts repository.

Wischmeier & Smith 1978 established from experimental data that the relationship between vegetation cover and soil erosion is exponential. Since we get an avalanche of data from Sentinel-2 allowing us to generate a monthly composites FCover, we decided to compute the V factor monthly (Vm). The vegetation cover frequency factor (1 - BSf) is computed for the entire monitoring period, and is used as a proxy for the land use class. In the original equation from Karydas & Panagos, 2018, the land cover class for agricultural land uses was estimated to range between 5 (pastures on slopes or poorly managed cropland) to 8 (well-managed meadow or field). Therefore, values below 5 and above 8 were clamped to that range, as we are only targeting agricultural land uses (well-managed forest may exceed 8 for instance). An imperviousness factor is also used in the original equation, but we used the Facebook HSRL and the JRC GHSL datasets to mask out all pixels which may affected by impervious surfaces, which allowed us to drop that factor. To obtain V, the monthly Vm factors are averaged over the monitoring period.

L — Landscape factor (dimensionless)

The L-factor is a factor incorporating the erosion-controlling effect of landscape features, such linear alterations to interrupt rainfall runoff in the form of physical field boundaries or terraces. Karydas & Panagos, 2018 model this effect in the following way:

Where Sf is the result of a convolution using a 3x3 anisotropic edge detection Sobel filter on Sentinel-2’s NIR band (B8). The convolution output is normalized by dividing by the maximum potential reflectance value of the B8 band DNmax, which in the case of Sentinel-2 on GEE is 10000. It is doing a good job at identifying linear (boundaries, terraces) and punctual (trees/bushes in field) features, even on 10m imagery (see screenshot below).

Result of the 3x3 pixell (30m) Sobel filter, highlighting within-field punctual and linear features, which has a positive impact on soil erosion

A — Average annual soil erosion rate in soil mass per unit area per year (t.ha−1.year−1)

The metric we are after. This is a ballpark approximation for the areas where bare soil is exposed, and is the best we can do considering the global approach applied and the open datasets available to do so.

The Google Earth Engine App where this global soil loss equation was implemented, and bearing the bold name of “Global Soil Erosion Watch”, can be found here, and its github repository here.

Find below a quick explanation of the interface components.

Layers

Bare Soil RGB composite: This is the layer resulting from the GEOS3 bare soil synthetic data generator from Demattê et al., 2020. The default data displayed is that of Kenya for the year 2020.

Bare Soil Frequency BSf: The number of times the cloud-free sentinel-2 observations were determined to be bare soil by GEOS3 in the time series, divided by the total number of cloud-free observations. Since bare soil most likely occurs in the drier season when cloud cover is less likely, this frequency may be over-estimated. This is more likely to be the case in sub-tropical regions with a pronounced rainy season.

The sustainability factor S: This factor combines the effect of both the cover vegetation factor V and the landscape factor L, and plots its inverse (S = 1 / (V*L)), so as to be in the range [0–1], with 0 being favorable to sustainably managed land, and 1 being unfavorable. Many of the areas with a high sustainability factor value may be open mining pits or rocky outcrops, which may impact the aggregated statistics on soil loss for a given jurisdiction. To mitigate this effect, values above 0.9 are masked as these correspond to areas of total vegetation absence and imperviousness, and most likely do not correspond to cropland or grassland soils.

Soil Erosion Hazard A: This layer is the result of the revised RUSLE equation presented in this post. It is the best possible approximation we can make of annual soil loss using an empirical model like RUSLE in combination with global datasets made available by the open-source community. The novelty here is the integration of high-resolution multi-temporal Sentinel-2 covariates, namely Sentinel-2 derived Sobel convolution Sf and the temporal, bare soil frequency-weighted (BSf) fraction of green vegetation cover FCoverm.

Buttons

The right panel offers a list of buttons to generate custom output for any world sub-administrative areas.

The first button is used to select the country, the second the first administrative level within that country, and the third the second administrative level. This is necessary to select a small enough area to compute the layers on-the-fly in GEE. By small, we recommend an area no larger than 10000 km² to retrieve information on-the-fly, as larger may incur a very long wait, or worse, no results at all 😱

The fourth button corresponds to the time period for which to compute the RUSLE equation. It is advised to use a full year, so as to incorporate all seasonal landscape conditions into the analysis. Time intervals as short as a month are available in case 12 months takes too long to generate.

The fifth button allows you to select the year, starting in 2016 (first full year of Sentinel-2 data availability globally), and running up to 2021. 2021 will obviously fail if a period in the future is chosen 🔮

The sixth button allows you to select the specific months for which to perform the analysis. This is limited to select the full year (Jan-Dec), a half-year (Jan-Jun or Jul-Dec) or quarters (Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec).

In the example below, we generated the data for the Al-Fasher district in Northern Darfur, where it has long been identified that the phenomenon of desertification, a specific-case of soil erosion in arid and semi-arid areas induced by climate change and anthropogenic activities, is fueling conflicts.

Global Charts

The three global charts to the right illustrate the following:

The proportion of the total area observed bare/non-bare in the selected time period. In the case of the arid area of Al-Fasher, almost every pixel in the image has been observed bare, and the 1.7% exception are likely sparsely located woody perennials.
The bare soil frequency histogram, summarizing the distribution of bare soil pixel’s bare frequency. Unsurprisingly, the distribution is ramping up towards the high frequency values, as the majority of the pixels observed correspond to land degraded beyond arable.
The annual soil loss rate histogram. The distribution was normalized using the natural logarithm function because the majority of the soil erosion rates typically fall between 0 and 10 t.ha-1.year-1, with a runaway effects of a few values peaking at 30+ t.ha-1.year-1, which are invisible in a plot if not normalized. The further to the right the peak of the distribution is, the more significant the soil loss hazard is in the area. The skewness of the distribution towards the right is also an important indicator of soil erosion hazard. All values below 0 represent minor erosion (< 1 t.ha-1.year-1), so if the histogram barely exceeds 0 on the x-axis, the region is generally speaking not prone to erosion.

On-Draw FCover plots

A nice feature of the App is that it allows the user to draw a rectangle, polygon or a point to plot an FCover profile (green curve) calculated from Sentinel-2 at the corresponding location. If a polygon or rectangle is drawn, the mean value corresponding to the polygon is returned. Moreover, the dates (temporally aggregated in 15 days time intervals) flagged as bare soils are plotted on the graph as brown dots, and their values correspond to the spatially and temporally averaged value of those observed bare soil pixels.

A few additional figures are provided like slope steepness, bare soil frequency and the annual soil loss rate of the clicked pixel. The best part is, you can click as many times as you want and keep plotting! 📈

Is the data any good?

The lack of long-term field scale measurements does not allow us to directly validate the generated RUSLE output. Even cross-comparison with other soil erosion assessments makes little sense here as the granularity of the produced output far exceeds what has been produced until now.

Considering this tool brings soil loss estimations to the hands of many for any location in the world, I encourage the community to use it and abuse it, and maybe even cross-reference the information with some field measurements and gauge the usefulness of the tool themselves.

The next step is to look beyond the indiscriminate measurement of soil loss by diving deep into the specific issues of semi-arid landscape restoration efforts and illustrating how SoilWatch’s EO capabilities can assist project developers in designing and implementing rigorous monitoring and reporting protocols. Stay tuned!

Invaluable resources that supported this work:

The Joint Research Centre’s Soil Data Centre (ESDAC), a gold mine for soil research and information, research activities I happily pay taxes for!
The Copernicus programme from the EU, because I wouldn’t have a job if there were no EO missions down-linking free and open data!
@jdbcode’s repositories and contributions to GEE’s community content
@Samapriya’s awesome-gee-community-dataset.
SoilGrids and OpenLandMap for the great soil datasets they provide.
Consult the github repository for more credits and app-related information.

Like this content? → Follow us on Twitter for more