Dynamic soil information at farm scale based on Machine Learning and EO data: building an Open Soil Data Cube for Europe
Prepared by: Tom Hengl (OpenGeoHub / EnvirometriX), Leandro Parente (OpenGeoHub / EnvirometriX), Ichsani Wheeler (OpenGeoHub / EnvirometriX) and Carmelo Bonannella (OpenGeoHub)
Soils symbolize fertility and are a foundation of our civilization. There is an increasing focus on soils due to their significant ecosystem services — from growing crops, to filtering water and providing building material. Soils are also one of the potential carbon pools that could significantly help decrease CO2 in the atmosphere. The current systems in place for monitoring soil properties — physical, chemical, and biological characteristics — along with measures of soil loss and degradation, do not provide an accurate picture of changes in the soil resource over time. To close that gap, OpenGeoHub, EnvirometriX and partners are building Open Soil Data Cube-type solutions utilizing Ensemble Machine Learning and massive Earth Observation data to generate predictions for billions of pixels. Find out how to access and use these data and contribute to this initiative!
Soils need to be monitored because they are, in principle, fragile ecosystems and losing soils can get extremely costly. For example, scientists estimate that it takes about 100 years to produce about 3 cm of top-soil (thickness) (Stockmann et al., 2014) and these are time periods we can’t afford anymore. Major drivers of soil degradation are usually unsustainable land use and population pressure (Borrelli et al., 2017). The common threats to soil health include:
- Loss of soil organic matter;
- Loss of biodiversity;
- Loss of soil through erosion;
- Soil compaction and soil pollution by heavy metals and similar;
In the last 150 years, half of the topsoil on the planet has been degraded due to erosion, compaction, desertification, acidification, and loss of soil organic carbon and primary nutrients; because of changes in global land use and climate, soil erosion might increase up to 60% in the next 30 years (Borrelli et al., 2017). It is somewhat paradoxical that, on the one hand, soils are one of the solutions to mitigating greenhouse gas emissions (Sha et al., 2022), while on the other hand 60–70% of EU soils are unhealthy, mainly because of unsustainable management practices. A team in the USA has recently estimated that Continental USA may lose 1.8 petagrams of soil organic carbon under climate change (Gautam et al., 2022).
Key soil health indicators
A practical way to follow soil dynamics is to measure and monitor concrete soil health indicators, specifically ones that define the state of soil also in relation with their ecosystem services. Currently eight (8) soil health indicators have been proposed to assess the European Mission Board (MB) for Soil Health and Food objectives, these are: (1) Presence of soil pollutants, excess nutrients and salts; (2) Soil organic carbon stock; (3) Soil structure including soil bulk density and absence of soil sealing and erosion; (4) Soil biodiversity; (5) Soil nutrients and acidity (pH); (6) Vegetation cover; (7) Landscape heterogeneity, and (8) Forest cover. The Open Soil Data Cube described in this article has been produced with the specific purpose to allow easy and robust validation of soil health indicators, especially where no national reference datasets exist or are not publicly available. We would be further adding new and updating existing key soil health indicators to allow for trend-analysis at near-to-farm-scale in the years to come.
Open Soil Data Cube for Continental Europe
Within the Geo-harmonizer and AgriCaptureCO2.eu projects, we have developed methodology for mapping and monitoring soil nutrients using cutting-edge machine learning methods and state-of-the-art publicly available Earth Observation data (GLAD Landsat; Sentinel-2). Our especial interest is in producing spatiotemporal predictions of key soil variables such as soil organic carbon, pH, soil nutrients, clay and sand content and related soil hydraulic properties. We typically generate annual predictions of soil variables so that these can be used for time-series analysis and for detecting key drivers of changes in soil. We have previously mapped dynamic land cover and vegetation for the EU at fine spatial resolution (30-m) and this has shown multiple advantages over purely spatial approaches (Witjes et al., 2022). We are currently publishing multiple outputs for land cover, forest tree species, soils and natural hazards (fires, floods) using a consistent spatiotemporal machine learning system with all output published via the https://EcoDataCube.eu.
Although a number of pan-EU predictions of soil properties already exist (Toth et al, 2017; Ballabio et al., 2019) these are based on still relatively coarse resolutions (250-m) and focus on the spatial component of variation only. Numerous research has shown that soil properties change dynamically, often significantly, primarily due to the changes in land use, changes in rainfall and climate in general. For example, soil pH (Huang et al., 2022) and soil organic carbon (Knotters et al., 2022) have changed significantly in the last 40+ years; mainly due to land use intensification, conversion of natural wetlands and similar. The recent initiatives by UNCCD / IPCC put loss of soil carbon as one of the key indicators of the Land Degradation Neutrality.
To help provide seamless dynamic soil information across the European continent, we have generated spacetime predictions of key soil properties at 30 m spatial resolution, for four standard depths 0, 30, 60 and 100 cm and for six periods: (1) 2000–2002, (2) 2002–2006, (3) 2006–2010, (4) 2010–2014, (5) 2014–2018 and (6) 2018–2020 using 3D+T Ensemble Machine Learning and large stack of EO images. Our initial models for predicting soil variables fitted using spatiotemporal overlays and results of cross-validation show that these models are significant. Several originally prepared 30-m resolution covariates (Landsat products especially: Red, NIR and SWIR bands, NDVI, SAVI) correlate significantly with dynamic soil pH, carbon content and hence can be used to provide predictions at unprecedented levels of detail. A scientific publication describing methodological steps and accuracy results is pending. Subscribe to our channels to receive updates.
Open Soil Data Cube technical details
General characteristics of the Open Soil Data Cube for Europe:
- Consistent input training points (e.g. LUCAS-soil, GEMAS and national soil profile databases, most importantly BZE LW German national soil profile DB) that have been quality controlled and can be used to produce unbiased estimates of soil properties in spacetime;
- Covariate layers (Landsat seasonal images, vegetation indices, terrain, lithological and climatic layers) prepared as complete consistent gap-filled Cloud-Optimized GeoTIFFs;
- Prediction errors are provided at pixel level so that further uses for improved sampling and local modeling are possible;
- All processing steps are fully documented, fully automated and can be used to update predictions as the new point data arrives (e.g. LUCAS 2022),
- Data (predictions) available via zenodo.org and via Wasabi Cloud-Optimized GeoTIFFs S3 service allowing users to directly use it as analysis-ready geospatial DB;
Targeted / intended uses of the Open Soil Data Cube for Europe include:
- Land restoration and regenerative agriculture projects;
- An open platform for soil organic carbon monitoring;
- Time-series analysis of trends in soil properties and detection of positive and negative drivers of change;
- Uncertainty guided sampling to help improve predictions at local / regional levels;
Accessing the Open Soil Data Cube
To access the maps mentioned visit the https://EcoDataCube.eu data portal or use the STAC. Maps are distributed under CC BY SA 4.0 license and free to download. Input training points, code and instructions used to prepare the maps will be made available in the coming months.
The short description of currently available soil properties:
- log.oc = log organic carbon [g/kg] to back-transform use exp(x/10)-1;
- ph.h2o = soil pH in H2O;
- sand.tot = sand content [percent];
- clay.tot = clay content [percent];
- db_od = bulk density over dry [kg/m3 ⨉ 10];
Estimation type: m = mean value; md = prediction error;
Soil properties were predicted at fixed depths:
- Surface soil = s0..0cm,
- Subsoil 1 = s30..30cm,
- Subsoil 2 = s60..60cm,
- Subsoil 3 = s100..100cm.
To produce estimates for depth intervals e.g. 0–30 cm, 0–100 cm best use the trapezoidal rule formula.
Periods: 2000 (2000–2003), 2004 (2004–2007), 2008 (2008–2011), 2012 (2012–2015), 2016 (2016–2019), 2020;
Predictions are based on the 3D Ensemble Machine Learning framework, as implemented in the R environment for statistical computing (Hengl & MacMillan, 2019; Hengl, et al., 2021). For each pixel we provide prediction errors as 1 standard deviation in either log or the original variable scale. To back-transform the log.oc maps use formula: exp(x/10)-1. These are examples of back-transformed values:
- log.oc = 15 → 0.3% SOC;
- log.oc = 20 → 0.6% SOC;
- log.oc = 25 → 1.1% SOC;
- log.oc = 30 → 1.9% SOC;
- log.oc = 35 → 3.2% SOC;
- log.oc = 40 → 5.3% SOC;
- log.oc = 50 → 14.8% SOC;
The bulk density maps are also provided in 10 kg / m-cubic to reduce total data size; to convert values to kg / m-cubic multiply by 10 e.g. 120 = 1200 kg / m-cubic = 1.2 t / m-cubic.
How can you access and view the maps? Simply open the image URL in QGIS and then you do not have to download large tiffs.
Contribute to the Open Soil Data Cube for Europe
We plan to continuously update the predictions hence contributions are welcome:
- If you are a soil surveyor, soil researchers or agricultural extension expert, aware of some soil samples / soil profiles that cover period 2000–2022+ please share your point data with us (we are open to signing a professional Data Sharing Agreement) so we can use your data for public good i.e. to support land restoration and regenerative agriculture projects;
- If you discover a bug or an issue i.e. if our predictions significantly differ from your ground-truth measurements, please contact us and/or report a bug;
- If you are producing European-wide data at comparable spatial resolutions e.g. 10m to 250m please contact us and we can contribute your layers to the data cube (provided that some minimum conditions are met e.g. open data license, metadata provided, complete consistent gap-filled layers);
All bugs, updates, requests and improvements can be registered via the official Geo-harmonizer project repository.
Using Soil Data Cube at farm scale
If you are a landowner, agricultural business or food producer, and if you require additional information on soils for soil carbon certification, please contact EnvirometriX and/or the corresponding AgriCapture partners. We are eager to help you improve soil health of your land and optimize your land use systems.
Predictions v0.2 of Open Soil Data Cube for Europe are first publicly released predictions of soil properties at high spatial resolution for the continental Europe for period 2000–2020. These predictions are released with no warranty and should be used for testing purposes only. You understand that you download from, or otherwise obtain content or services through, the OpenGeoHub websites at your own discretion and risk (see also: General terms and conditions).
You are free to copy, distribute, transmit and adapt our data, as long as you credit OpenGeoHub and contributors. If you alter, or build upon, our data, you may distribute the result only under the same license. The full legal code explains your rights and responsibilities.
The Open Soil Data Cube for Europe is a joint initiative of OpenGeoHub, EnvirometriX, MultiOne.hr, Thuenen Institute, JRC ESDC + all other partners that contribute point data. We are especially grateful to the European Soil Data Center, Thuenen Institute of Climate-Smart Agriculture and various national institutes for providing access to soil samples and profiles that were used to produce consistent predictions for Europe. Without the LUCAS soil project (Orgiazzi et al., 2018) and various other European Commission-funded projects, production of dynamic soil information for Europe would probably have not been possible.
OpenGeoHub is an independent not-for-profit research foundation promoting Open Source and Open Data solutions. EnvirometriX Ltd. is the commercial branch of the group responsible for designing soil sampling designs for the AgriCapture and similar soil monitoring projects. AgriCaptureCO2 receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 101004282. The OpenDataScience.eu project is co-financed by the European Union (CEF Telecom project 2018-EU-IA-0095) as a part of the Connecting Europe Facility (CEF) in Telecom programme — a key EU instrument to facilitate cross-border interaction between public administrations, businesses and citizens, by deploying digital service infrastructures (DSIs) and broadband networks.
- Ballabio, C., Lugato, E., Fernández-Ugalde, O., Orgiazzi, A., Jones, A., Borrelli, P., … & Panagos, P. (2019). Mapping LUCAS topsoil chemical properties at European scale using Gaussian process regression. Geoderma, 355, 113912. https://doi.org/10.1016/j.geoderma.2019.113912
- Borrelli, P., Robinson, D. A., Fleischer, L. R., Lugato, E., Ballabio, C., Alewell, C., … & Panagos, P. (2017). An assessment of the global impact of 21st century land use change on soil erosion. Nature communications, 8(1), 1–13. https://doi.org/10.1038/s41467-017-02142-7
- Gautam, S., Mishra, U., Scown, C. D., Wills, S. A., Adhikari, K., & Drewniak, B. A. (2022). Continental United States may lose 1.8 petagrams of soil organic carbon under climate change by 2100. Global Ecology and Biogeography, 31(6), 1147–1160. https://doi.org/10.1111/geb.13489
- Hengl, T., & MacMillan, R. A. (2019). Predictive soil mapping with R (p. 370). Wageningen: OpenGeoHub Foundation. Retrieved from https://soilmapper.org
- Hengl, Tomislav, Miller, M. A. E., Križan, J., Shepherd, K. D., Sila, A., Kilibarda, M., … Crouch, J. (2021). African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Scientific Reports, 11(1), 1–18. https://doi.org/10.1038/s41598-021-85639-y
- Huang, X., Cui, C., Hou, E., Li, F., Liu, W., Jiang, L., … & Xu, X. (2022). Acidification of soil due to forestation at the global scale. Forest Ecology and Management, 505, 119951. https://doi.org/10.1016/j.foreco.2021.119951
- Knotters, M., Teuling, K., Reijneveld, A., Lesschen, J. P., & Kuikman, P. (2022). Changes in organic matter contents and carbon stocks in Dutch soils, 1998–2018. Geoderma, 414, 115751. https://doi.org/10.1016/j.geoderma.2022.115751
- Orgiazzi, A., Ballabio, C., Panagos, P., Jones, A., & Fernández‐Ugalde, O. (2018). LUCAS Soil, the largest expandable soil dataset for Europe: a review. European Journal of Soil Science, 69(1), 140–153. https://doi.org/10.1111/ejss.12499
- Sha, Z., Bai, Y., Li, R., Lan, H., Zhang, X., Li, J., … & Xie, Y. (2022). The global carbon sink potential of terrestrial vegetation can be increased substantially by optimal land management. Communications Earth & Environment, 3(1), 1–10. https://doi.org/10.1038/s43247-021-00333-1
- Stockmann, U., Minasny, B., & McBratney, A. B. (2014). How fast does soil grow?. Geoderma, 216, 48–61. https://doi.org/10.1016/j.geoderma.2013.10.007
- Tóth, B., Weynants, M., Pásztor, L., & Hengl, T. (2017). 3D soil hydraulic database of Europe at 250 m resolution. Hydrological Processes, 31(14), 2662–2666. https://doi.org/10.1002/hyp.11203
- Witjes, M., Parente, L., van Diemen, C. J., Hengl, T., Landa, M., Brodsky, L., … & Glusica, L. (2022). A spatiotemporal ensemble machine learning framework for generating land use/land cover time-series maps for Europe (2000–2019) based on LUCAS, CORINE and GLAD Landsat. PeerJ, in press, https://doi.org/10.21203/rs.3.rs-561383/v3