Detecting Agricultural Croplands from Sentinel-2 Satellite Imagery

A guide to identifying croplands with reasonable accuracy using a semantic segmentation model.

Radiant Earth

Published in

Radiant Earth Insights

7 min readFeb 3, 2022

Daniel Nwaeze

, Data Scientist,

Radiant Earth Foundation

We developed UNet-Agri, a benchmark machine learning model that classifies croplands using open-access Sentinel-2 imagery at 10m spatial resolution with ground reference data provided by the Western Cape Department of Agriculture in South Africa. This post is a step-by-step walkthrough of how we developed the model and evaluated its performance. Understanding what UNet-Agri does will help you build your model and deploy it for a similar application.

Using Sentinel-2 as input for UNet-Agri

Popular field detection approaches employ imagery from a higher spatial resolution sensor than Sentinel-2 as input for traditional models like Convolution Neural Network (CNN). But while spatial resolutions at 10m or higher tend to pose challenges for field boundary delineation across smallholder-dominated farms (less than one acre), we used Sentinel-2 to demonstrate its capability and identify the minimum size of fields that can be detected using Sentinel-2. Sentinel-2 is designed for land monitoring, including agriculture, among other thematic areas and has the best spatial resolution among open-access satellites.

Our initial investigation showed that CNN models like EfficientNet and MobileNet produced average results in more straightforward image test cases with larger field sizes but failed to deliver good results on images with less obvious and smaller field sizes. They also produced sparse segmentations which were not clearly distinguished. For this reason, we developed UNet-Agri, a semantic segmentation model to use with Sentinel-2. Our model can detect croplands with reasonable accuracy at the 10m spatial resolution.

We will dive deeper into the data preparation in the following sections, showing you how we augmented the data before looking at the model, including the metrics we used to evaluate the model’s accuracy. This tutorial also demonstrates how to use the algorithm effectively.

For background information on using geospatial data with ML, read how Earth observation (EO) and machine learning (ML) enhance solutions for agricultural monitoring and what Radiant Earth is doing to support the global development community in this space.

Data Exploration

The satellite images were collected in the year 2017 during the winter season via Sentinel-2, having 12 different spectral bands. We combined the BGR bands (B02, B03, and B04) to a single 3-band imagery as input to the model.

Figure 1: Map of Western Cape with the location of our training data outlined in the blue bounding box.

Figure 1 shows our study region where ground reference data on agricultural field boundaries were made available from the Western Cape Department of Agriculture. We divided our area of interest into chips of 256 x 256 pixels. Labels for field boundaries were recorded as vectors in a GeoJSON format containing the field ID, crop type, and geometry of the fields within each chip. Converting the vector geometries to rasters at 10m spatial resolution, we derived a crop/no-crop mask for each chip, as seen in figure 2 below.

Figure 2: A sample chip with its true color image (left), the vector labels (center), and the rasterized labels (right). Yellow in the masks indicates crop, and dark blue indicates no crop.

We had limited data for this model, about 3900 image chips. We split the data into training, validation, and test at 70%, 20%, and 10%. To increase the training and validation dataset and help the model generalize better, we augmented the training and validation data using common techniques, including zooming, rotation, flipping, blurring, and adjusting the brightness.

Model Development

We first tried out traditional CNN models like MobileNet and EfficientNet. These models gave decent results for easier image cases with large croplands, but they did not provide a reasonably accurate prediction for image chips with smaller, less observable croplands. They also produced noisy segmentation results. Next, we tried the UNet architecture (Figure 3), an effective model for pixel-wise semantic segmentation.

Initial results from the UNet model were promising, so we took the approach further and added a pre-trained VGG-19 encoder to improve its performance, which we named UNet-Agri. The Visual Geometry Group-19 (VGG-19) is an advanced CNN model with pre-trained layers (nineteen in number) and a great understanding of what defines an image in terms of shape, color, and structure. VGG-19 is a lightweight model, compared to other pre-trained models, allowing for much deeper networks to produce better output segmentation. Its architecture is similar to UNet, making it easy to concatenate with UNet.

Model Evaluation

The model was trained for 60 epochs, at which point there was no noticeable performance gain. Following the successful training of the model, we used the test dataset to evaluate the model’s performance.

The model generates a probabilistic prediction of the field boundaries from 0 to 1, where 0 indicates no field exists at a given location and 1 indicates a field definitely exists. Thus a prediction of 0.6 means the model is 60% confident a field exists at that location.

We set a threshold of 0.5 to convert the probabilistic predictions to masks and compare model predictions to true labels. You can see the effect of setting the threshold in Figure 4. Table 1 shows the performance of the model using a threshold of 0.5.

The model has relatively higher accuracy across all four metrics for the training dataset in comparison to the validation and test ones. However, the performance is very good on those datasets with an IOU of 0.76 and F1, Precision, and Recall all at ~0.86.

Figure 4 — Thresholding model predictions to generate a mask of the cropland areas. Images from left to right show: True color image of Sentinel-2, True cropland mask, Predicted cropland. Yellow in the masks indicates crop, and dark blue indicates no crop.

Table 1 — Performance metrics for Training, Validation, and Test datasets.

Figure 5: Effect of Threshold (0.5:0.95) on the metric scores.

To ensure the 0.50 threshold is a reasonable choice, we evaluated the results across different thresholds from 0.50 to 0.95. Figure 5 shows that increasing the model threshold from 0.50 to 0.95 increases the Precision as expected, but this comes with a cost. The other three metrics (IOU, F1, and Recall) decrease with an increasing threshold. This is a common trade-off in probabilistic modeling, and we concluded that a threshold of 0.50 is good for our application and balancing the Precision and Recall of the model.

Segmentation Results

Here we present several example predictions from the model. Figure 6 shows the input Sentinel-2 imagery along with the true and predicted cropland masks. This figure demonstrates that the model detects croplands with reasonable accuracy at the 10m spatial resolution. However, the resolution does not help with separating individual fields in many cases. This is also evident from the true mask. Overall, the model performs very well in detecting croplands.

Figure 6: Example predictions of cropland from Sentinel-2 imagery. Each row represents an example chip. The left column is the True Color Image from Sentinel-2. In the center is the rasterized True Cropland Mask and in the right column is the Predicted Cropland Mask. Yellow in the masks indicates crop, and dark blue indicates no crop.

Model Performance Compared to Vector Labels

In the previous section, we evaluated the model’s performance against the rasterized cropland labels. While this shows the ability of the model to learn from the input training dataset, we are interested to see how the model performs against the vector data (i.e., polygons). Therefore, we vectorized the predictions of the model and compared them against the vector labels.

We have 165,805 fields in the initial datasets, with an average size of 15 acres. Figure 7 shows the distribution of field sizes in our dataset.

Figure 7: Distribution of field sizes in acres in the labeled dataset.

To evaluate the model’s performance using vector labels and predictions, we use the F1 score as a metric. Overall, the model shows an average F1 score of 0.76.

Figure 8 shows the distribution of the F1 scores vs. field sizes, where each box represents the range of the F1 score for a specific range of field sizes. This figure helps better understand how the model performance improves as the field sizes increase.

Figure 8: Distribution of F1 Score for various field sizes. Each box represents the F1 score (y axis) range for the corresponding range of field sizes (x axis). In each box, the lower tick represents the minimum value; the lower edge of the box represents the 25th percentile; the middle line represents the median or 50th percentile; the upper edge of the box represents the 75th percentile; and the upper tick represents the maximum value.

The percentiles of the F1 score increased with increasing field sizes, and the range of them also decreased. This indicates the consistency of model performance and increased accuracy at larger field sizes. In particular, from the first bin to the fourth bin on the left, there is a significant decrease in the range of F1 scores and an increase in the median F1 score. While the median is about 0.4 for fields smaller than 1 acre, it increases to 0.9 for fields between 2 and 3 acres.

The minimum value of the boxplot converges to an F1 score of about 0.9 at larger field sizes, which is an indication of the accurate performance of the model. Every box also has a maximum value of 1.0, meaning that at least one field within that size has been correctly detected.

Finally, using the 15 acres as the average size of the fields, we calculated that the average F1 score for fields larger than 15 acres is 0.91, and the average score for fields smaller than that is 0.71.

This project is supported by Enabling Crop Analytics at Scale (ECAAS), an initiative managed by Tetra Tech with funding from the Bill & Melinda Gates Foundation.

Detecting Agricultural Croplands from Sentinel-2 Satellite Imagery

A guide to identifying croplands with reasonable accuracy using a semantic segmentation model.

Using Sentinel-2 as input for UNet-Agri

Model Performance Compared to Vector Labels

Written by Radiant Earth