Soil and Agronomy Data Cube for Africa at 30-m spatial resolution
Prepared by: Tom Hengl (OpenGeoHub) and Leandro Parente (OpenGeoHub)
Earth Observation, soil, terrain, land cover and land use, climate data are increasingly available for Africa for research and businesses. This tutorial explains: how to access the iSDAsoil property and nutrient maps for Africa and number of Sentinel-2 cloud-free bands and terrain variables, how to compute with it without a need to download terrabytes of data. A complete tutorial written using Rmarkdown is available here. To learn more about Cloud-Optimized GeoTIFFs and geocomputing in Python please visit also this tutorial. A copy of the iSDAsoil is also available from Amazon AWS Open Data and the Google Earth Engine Data Catalog.
iSDAsoil methodology and data
Innovative Solutions for Decision Agriculture Ltd (iSDA) is a social enterprise with the mission to improve smallholder farmer profitability across Africa. iSDA has released in November 2020 a fully-fledged Soil Information System of Africa at 30-m spatial resolution (data available under the Creative Commons Attribution license). The main purpose of this data is to help with implementation of Integrated Soil Fertility Management (ISFM) and other sustainable soil management practices in Africa. Production of this data set is documented in detail in this medium article, and also in this peer-reviewed publication:
- Hengl, T., Miller, M.A.E., Križan, J. et al. (2021) African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130. https://doi.org/10.1038/s41598-021-85639-y
The produced predictions are now available as Cloud-Optimized GeoTIFFs through a number of services (Wasabi, Google Earth Engine, Amazon AWS public datasets) and as such are basically available to developers and users without restrictions.
This tutorial explains: how to access the iSDAsoil property and nutrient maps for Africa and number of Sentinel-2 cloud-free bands and terrain variables, how to compute with it without a need to download terrabytes of data. A complete tutorial written using Rmarkdown is available here. The repository https://github.com/iSDA-Africa/ also contains examples with iSDAsoil worked out in Python (how to estimate liming requirements for an area in Rwanda and similar).
What is a “Cloud-Optimized GeoTIFF” (COG)?
Cloud-Optimized GeoTIFFs are post-processed images that are optimized for file sharing and can be considered to be equivalent to Geospatial databases as they can serve spatial queries. It is possibly the best way to distribute spatial layers without restrictions, as users save time accessing data and can directly load data into the majority of GIS software (mainly thanks to the GDAL development team).
How does COG works? COG file has a spatial index based on tiles and scales. So imagine if you wish to overlay a single point to get the value inside the COG, a http service will first locate the tile (usually very fast), then locate the exact pixel inside the COG, and finally return the value (always numeric). So in summary, as long as you only plan to access small portions of the data, COG would typically work very fast and it is as efficient as accessing and searching a geospatial database. What can limit the COG services is maybe the bandwidth, number of requests per IP, size of the data returned in the requests and similar. Also note that, if you use GDAL or any GDAL compatible GIS software (we recommend using QGIS), any processing of the data is done by your local machine: the COG service is only focused on serving the data. It is also a highly portable system as you only have to copy/upload and make GeoTIFFs available (ideally via Amazon S3 or on other Amazon S3 compatible services).
How to use COG’s?
There are typically two main recommended ways to use the COG’s to do modeling and visualization:
- Load the data directly into a GIS / Geoserver, then select analysis of interest for the bounding box of interest.
- Use R, Python or similar to program the analysis.
In practice, we recommend using both access paths at the same time, to ensure that you visually validate analysis and the results of analysis. In most simple terms: keep the view on the data open in the QGIS, then use the R / Python to program analysis. For spatial analysis we recommend accessing the data primarily using the rasterio package in python and/or terra / rgdal packages in R.
This is for example the web address of the soil texture classes for Africa at 30-m resolution:
Important note: please do NOT open this URL in a browser because the total file size is 2.1GiB and your browser will directly try to download the file. Instead, you can add this URL to e.g. QGIS by using:
- Select “Layer” → “Add Layer” → “Add Raster layer”;
- Select “Source Type” → “Protocol HTTP(s)/Cloud”;
- Enter the URL of the layer and leave “No authentication”;
Consequently, you should see the following:
This looks like the whole map is available locally on your machine, but it is NOT: it is only the preview of the data at some aggregated scale that is actually downloaded. Google Earth and many other web-GIS applications basically work the same (download based on the viewing angle and scale).
In the case above, we have also added the legend by downloading the SLD file for soil texture classes. Note that once you have connected to the COG in QGIS, you can run any spatial analysis that is available from the software. Just have in mind that, anytime you wish to run analysis on larger part of data, QGIS will have to download ALL data for that bounding box and this can get time consuming.
To access the same layer from R, we would run:
tif = "/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif"
r = rast(tif)
class : SpatRaster
dimensions : 268670, 327948, 1 (nrow, ncol, nlyr)
resolution : 0.00027, 0.00027 (x, y)
extent : -31.46424, 57.08172, -34.89109, 37.64981 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
data source : sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif
names : sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1
Which shows that this is indeed a large GeoTIFF in WGS84 coordinates and with a spatial resolution of about 30-m. Again, R/terra did not download the whole image, but has only “connected” to the file and requested some metadata from the file header. Once we have connected to the COG from R, we can continue doing all standard spatial analysis e.g. crop values, do raster calculations etc. Assuming we are only interested for the values per country, this is then a very efficient system as you would only download minimum data needed for analysis. The whole size of the iSDAsoil layers is about 1.5TiB so definitively we do not recommend downloading all files covering whole of Africa.
Modeling cropland distribution as a function of climate, terrain and soils
In the OpenLandMap tutorial listed on gitlab you can also find an example how the Soil and Agronomy Data Cube for Africa can be used to model distribution of cropland as a function of climate, terrain and soils. To avoid doing excessive computing, we limit the analysis to Ethiopia and 1-km spatial resolution. The Rmarkdown tutorial explains how to: (1) first, download, resample and crop layers of interest to polygon map of Ethiopia, (2) load all data into R, (3) fit a Random Forest model that explains distribution of cropland, and (4) predict cropland in Ethiopia assuming (hypothetical) linear decrease of rainfall in the future.
In most simple terms, distribution of cropland can be model as:
Cropland ~ f( rainfall, soil properties, terrain / slope, … )
In the example in the tutorial, we actually use ALL pixels in the 1-km images to fit a model. This is because all target and ALL covariate layers are available as images, hence have in mind that the training matrix is large! To make computing efficient, we use the C++/ranger implementation of random forest (Wright et al. 2017). The modeling result shows:
## Ranger result
## ranger(fm.crop, data = et.sp1km@data[sel, ], num.trees = 85, importance = "impurity")
## Type: Regression
## Number of trees: 85
## Sample size: 1084576
## Number of independent variables: 8
## Mtry: 2
## Target node size: 5
## Variable importance mode: impurity
## Splitrule: variance
## OOB prediction error (MSE): 31.36564
## R squared (OOB): 0.913671
In this case results show that the model is significant and elevation and rainfall come up as overall most important variables. It is good to see that also soil pH is an important covariate, although in this specific case croplands seems to be dominantly controlled by climate. The resulting comparison actual vs potential shows that indeed, one can expect serious decrease in cropland distribution assuming a decrease in rainfall:
Interested in this type of modeling? Test the iSDAsoil layers / access the data from QGIS and/or R/Python. Run analysis and document your code via github/gitlab; then share the results via Twitter or Medium, and please mention @iSDAAfrica and #SoilData4Africa so we can also follow the progress.
Layers currently available for Africa
Within the iSDAsoil project, we have made number of layers available as COG’s i.e. for public access and use without restrictions (no registration needed, no access costs):
- iSDAsoil layers representing soil properties and nutrients at two standard depth intervals 0–20 and 20–50 cm;
- Sentinel-2 cloud-free mosaics (prepared for the purpose of iSDAsoil project);
- Digital Terrain Model (DTM) based on ALOS AW3D30 and NASADEM and DTM derivatives (prepared for the purpose of iSDAsoil project);
- OpenLandMap layers (from 250-m to 1-km resolution);
- Population map of Africa for 2018 at 30-m resolution (Facebook Connectivity Lab and Center for International Earth Science Information Network — CIESIN — Columbia University. 2016. High Resolution Settlement Layer HRSL);
The Sentinel mosaics for Africa (prepared by MultiOne.hr) are relatively large in size and might still contain artifacts between scenes and missing values beyond water bodies etc. The population density map at 30-m spatial resolution does NOT include some areas such as Sudan’s and Somalia.
To list all layers available at 30-m resolution for the whole of Africa please use this table. To list all layers available at 250-m resolution (global land mask) please use this table. Note: the file versions might change hence your code would need to be updated. Please subscribe to this repository or refer to https://isda-africa.com for the most up-to-date information about iSDAsoil.
Important note: We do NOT recommend downloading whole GeoTIFFs of Africa at 30-m resolution as these are usually 10–20GB in size (per file). The total size of the repository at the moment exceeds 1.5TB. Instead, if you need to analyze the whole land mask of Africa, we recommend downloading the files directly from zenodo.org and/or Amazon AWS. Also note that nutrient stocks and aggregated soil properties can be derived using variety of procedures (see e.g. Hengl & MacMillan (2019)) and the total values might eventually differ.
Other data providers of interest
Other data sources (not included in this Data Cube) and data portals for Africa with Earth Observation and similar data sets:
- Planet NICFI (https://www.planet.com/nicfi/): you can download the 5-meter resolution ARD imagery (sub-Sahara Africa only and CC-NC-Alike license only; project financed by the Norwegian Government);
- Digital Earth Africa (https://www.digitalearthafrica.org/): provides access to a map viewer and a sandbox / toolbox (GeoMAD) that can be used to derive various products per farm / polygon (project funded by US-based Leona M. and Harry B. Helmsley Charitable Trust and the Australian Government);
- Africa Knowledge Platform (https://africa-knowledge-platform.ec.europa.eu/): provides access to Africa’s social, economic, territorial and environmental development information.
- JRC’s Forest Resources and Carbon Emissions (IFORCE) Sentinel2 L1C cloud-free composites 2015–2017, 2018, 2019 and 2020 (https://forobs.jrc.ec.europa.eu/recaredd/S2_composite.php): provides access to Sentinel-2 L1C collection B11, B08, B04 (SWIR1, NIR, RED) bands for the period spanning from 2015 to 2020. For more details see: Simonetti et al. (2021).
- Africa Regional Data Cube ARDC (https://www.data4sdgs.org/index.php/initiatives/africa-regional-data-cube): provides various EO-based data products for Ghana, Kenya, Sierra Leone, Senegal, and Tanzania;
- AfriAlliance Africa-EU innovation alliance (https://afrialliance.org/): aims at providing climatic/meteorological and hydrological information;
- Open Buildings (https://sites.research.google/open-buildings/): provides detailed vectors of buildings for Africa;
A more detailed review of the Earth Observation (EO) data services for Africa and trends can also be found in Woldai (2020).
- FAO, Global Soil Partnership (GSP), (2016). Boosting Africa’s Soils. FAO Regional Conference for Africa (ARC), http://www.fao.org/3/a-i5532e.pdf
- Hengl, T., & MacMillan, R. A. (2019). Predictive soil mapping with R (p. 370). Lulu. com. Retrieved from https://soilmapper.org
- Hengl, T., Leenaars, J. G., Shepherd, K. D., Walsh, M. G., Heuvelink, G. B., Mamo, T., … others. (2017). Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutrient Cycling in Agroecosystems, 109(1), 77–102. doi:10.1007/s10705–017–9870-x
- Hengl, T., Miller, M.A.E., Križan, J. et al. (2021) African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130. doi:10.1038/s41598–021–85639-y
- Hijmans, R. J., Bivand, R., Forner, K., Ooms, J., & Pebesma, E. (2020). terra: Spatial Data Analysis. CRAN. Retrieved from https://rspatial.org/terra
- Sarago, V., Barron, K., Albercht, J. (2019). Pushing for adoption of Cloud Optimized GeoTIFF: An imagery format for cloud-native geospatial processing. http://cogeo.org
- Simonetti, D., Pimple, U., Langner, A., & Marelli, A. (2021). Pan-Tropical Sentinel-2 Cloud-Free Annual Composite Datasets. Data in Brief, 107488. doi:10.1016/j.dib.2021.107488
- Woldai, T. (2020). The status of Earth Observation (EO) & Geo-Information Sciences in Africa–trends and challenges. Geo-spatial Information Science, 23(1), 107–123. doi:10.1080/10095020.2020.1730711
- Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://www.jstatsoft.org/article/view/v077i01