Automatic field delineation: New release

Published in

Sentinel Hub Blog

13 min readSep 25, 2023

We have released a new version of our automatic field delineation algorithm for Sentinel-2 imagery, with an improved workflow and accuracy. We detail here the research and development voyage that led to the new version.

Written by Devis Peressutti. Work done by Nejc Vesel, Matej Batič, Matic Lubej, Nika Oman Kadunc, Sara Verbič, Žiga Lukšič, and Devis Peressutti.

Time flies! It’s been already three years since we released the first version (V1) of our automatic field delineation (FD) algorithm. A lot has happened since, and we surely have not been idle. In this blog-post we report on the major research and development activities that led to the release of the second version (V2) of the algorithm. First things first.

What is field delineation and why do you need it?

Field delineation, for us, refers to the process of automatically tracing the boundaries of agricultural parcels from satellite or aerial imagery. We consider an agricultural parcel as a spatially homogeneous land unit used for agricultural purposes, where a single crop is grown. The result of the FD is a set of closed vector polygons marking the extent of each agricultural parcel. Such polygons are the input to a multitude of applications, ranging from the management of agricultural resources, such as the Area Monitoring for the Common Agricultural Policy, to precision farming, to the estimation of damages to crop yield due to natural (e.g. drought, floods), and human-made disasters (e.g. war). Automatic estimation of parcels with high fidelity in a timely manner allows therefore to characterize the changes of agricultural landscapes due to anthropogenic activities, agricultural practices, and climate change consequences.

The underlying assumption of FD is that boundaries of agricultural parcels are visible and spatially resolvable from remotely sensed imagery. The first version of the algorithm used Sentinel-2 imagery, and we have since trained and validated the algorithm also on Planet Scope imagery and on aerial imagery. While the spatial, spectral, temporal resolution, and spectral range differ greatly between data sources, therefore affecting the smallest resolvable agricultural parcel, the FD algorithm and workflow does not change.

How does it work?

In a nutshell, the algorithm leverages the similarity of spatial, spectral, and temporal properties of pixels belonging to the same field. We use a deep neural network based on the popular U-net architecture to estimate the parcel’s extent, boundary, and the distance of the segmented image pixels to the boundary. We then combine the estimated probability images, either from a single temporal image or from a time series of images, into a single pseudo-probability image representing the extent of the parcels. In post-processing, the image prediction is converted into vector format, where each polygon represents the extent of homogeneous agricultural parcels.

What is new?

Many things!

As a reminder, V1 was trained on data from Lithuania. Since its release, we have applied V1 on more than 1 million square kilometers, delineating agricultural parcels in Europe and North America. Some noteworthy examples include estimating parcels for Ukraine (for the past 7 years), as well as for Germany, Hungary, Spain, Lithuania and Canada. We received positive feedback and constructive criticism, which has led to the improvements of V2.

We have improved many steps of the workflow, both the training and the inference phase, with the main aim of improving the accuracy of the results, and reducing the noise and uncertainty associated with some steps of the workflow.

Fig 1. Schematic representation of the improved training phase of FD V2. New or improved steps are shown in red. The blobby area-of-interest is fictional.

Fig. 1 shows the improved training phase. As opposed to V1, we considered imagery and reference data from the whole European continent. All imagery was downloaded through Sentinel Hub APIs. We used the AI4Boundaries and EuroCrops label datasets as source of ground-truth, and sampled locations to obtain stratified samples across geographical locations and crop-types. To reduce the sources of noise present in the training dataset, we applied two new steps, namely co-registration of the image series, and automatic curation of the reference data. Finally, we added a super-resolution layer to our U-net architecture to produce estimates of extent and boundaries at 4x smaller pixel size than the input image — for example, given the four B02, B03, B04, and B08 Sentinel-2 bands at 10 m pixel size, the network estimates parcel boundaries at 2.5 m. The neural network is then trained following hyper-parameter optimization. And ta-da, the weights of the V2 architecture are ready for inference.

Fig 2. Schematic representation of the inference phase of FD V2. New or improved steps are shown in red.

Fig. 2 shows an exemplary scheme for the inference phase. Given the area-of-interest (AOI) and time-of-interest (TOI), we forward a batch processing API request to Sentinel Hub, which scales the download of the desired input imagery. For each cell of the grid used by the service, we optionally co-register images to reduce misalignments, we normalize them to center the reflectance values to the sensitive range of the deep network, and finally we input them to the U-net network with the pre-trained weights. The network outputs the estimates for the extent, boundary and distance to boundary of each detected agricultural parcel for each input timestamp of the time-series. We then temporally and spatially merge such estimates to obtain a single image which is contoured to derive the polygons outlining each agricultural parcel. Polygons estimated for each cell in the grid are then harmonized and merged to obtain a single vector map covering the input AOI.

The complete list of improvements for V2 over V1 is as follows:

Europe-wide normalization of satellite imagery, to improve convergence and generalization of the deep network;
Co-registration of input satellite time-series to reduce frame-to-frame misalignments;
Use of open data from the AI4Boundaries and EuroCrops label datasets, which provide reference data of agricultural parcels for some European countries;
Automatic curation of reference data;
Super-resolution of estimated probability maps for the parcels’ extent, boundary and distance to boundary;
Optional “rectangularization” of estimated parcels;
Automation of execution.

The following paragraphs briefly describe the aim of each of the applied improvements.

Band normalization

Fig 3. Visual examples of different normalization methods evaluated against the convergence of the deep neural network and the achievable accuracy. Normalization values are computed over Europe.

We have investigated the influence of different normalization approaches on the convergence of the deep network and on the achieved accuracy for the estimated parcels. We found that linear normalization using percentiles of the band distributions (third picture in the top row of Figure 3) performed the best. To evaluate these methods, we derived band distribution for different land covers over Europe, resulting in a set of normalization factors that can be applied to a very large area, therefore improving the generalization of the model. You can find all details about this research in this blog-post.

Frame-to-frame co-registration

Geometric errors cause misalignments between consecutive satellite image acquisitions, which produces artefacts in FD results. They affect the accuracy of the training dataset, since temporal acquisitions are affected differently by geometric errors. For the same reasons, geometric errors affect the accuracy of the temporally merged probability image, which can also be seen when visualizing the results.

Fig. 4. Example of geometric errors for a stack of satellite imagery compared to the reference polygons for agricultural parcels (top row), and the improved alignment after co-registering the image frames (bottom row).

An example of the noise introduced by geometric errors is shown in Fig. 4, where the 3D spatio-temporal stack of the input satellite imagery is cut along one of the spatial dimensions. The vertical axis shows the time progression over different image acquisitions, while the horizontal axis shows the spatial index of the other spatial dimension, in this case northing. The yellow lines represent the boundaries of the reference polygons denoting agricultural parcels. For the unregistered stack shown in the top row, misalignments between the boundaries and the underlying images can be noticed. Such misalignments are compensated for when applying co-registration of consecutive frames, as shown in the bottom row. More details about the co-registration research can be found in this blog-post.

AI4Boundaries and EuroCrops

Normalization factors that are derived from a large area ensure that the model can operate well on image data coming from a wide range of locations. However, normalization factors are merely one requirement for such generalization. The key component to ensure generalization of the deep network to various locations, and possibly the entire globe, is to create a training dataset of reference polygons and corresponding satellite images which contains as many geographical locations as possible.

In our experience, the GeoSpatial Aid Application (GSAA) reference data offers the best reference labels for agricultural parcels. GSAA datasets are maintained by European Paying Agencies, and contain agricultural parcel polygons which are self-declared by farmers for the purpose of subsidies payment within the Common Agricultural Policy (CAP). These datasets are made openly available by some countries, which has spurred work aiming at unifying and standardizing their format to ease adoption. Two notable efforts in this direction are the AI4Boundaries and the EuroCrops datasets. The AI4Boundaries is specifically designed for the development and benchmarking of FD algorithms, and provides stratified samples of GSAA polygons and Sentinel-2, and high-resolution aerial imagery over Europe. On the other hand, EuroCrops collects and harmonizes GSAA datasets from several countries in the European Union, with the aim of providing a unified, machine-readable taxonomy of the possible crop-types present in the datasets. More details about these datasets can be found in this blog-post.

Fig. 5. Countries in green are included in the EuroCrops dataset.

For the development of V2, we created a training dataset using a combination of the AI4Boundaries and EuroCrops, stratifying locations to ensure that samples from all available countries were considered, as well as different crop-types. GSAA polygons only represent agricultural land cover, so other negative samples of land cover (e.g. water, built-up, etc.) were added to the training dataset.

Automatic curation of reference labels

A major component of noise that affects the FD results is the accuracy of the reference polygons used to create the training dataset, namely, how well the polygon for a given parcel matches the visible boundaries on the satellite imagery at a given time. The main causes of inaccuracies are:

The geometry of the polygon is incorrect.
The boundaries are not visible on the satellite imagery at the given acquisition.

The latter is particularly tedious to deal with, and is inherent to the nature of the problem. In fact, agricultural parcels are dynamic entities that change appearance and possibly shape throughout the seasons. This means that there exist time periods where boundaries are more discernible than other periods, and, more importantly, there are time periods where such boundaries are not visible at all. Ideally, we don’t want to include such acquisitions in the training dataset. Fig. 6 shows the dynamic evolution of agricultural parcels in southern France.

Fig. 6. Dynamic evolution of agricultural parcels in southern France. Boundaries between neighbouring parcels are more visible in some phases of the growing period than others.

We therefore set out to build a separate deep neural network to pinpoint inaccurate geometries and to select which time periods would be more suited for FD.

The result of this process is a probability score for each geometry and corresponding image at a given time period. For each image acquisition we can optionally filter out geometries with low probability scores that should represent wrong geometries or geometries with non-visible boundaries. Probability scores from the label curation network are shown in Fig. 7.

Fig. 7. Same dynamic evolution as shown in Fig 6, with overlaid the probability score from the label curation network for the central parcel. Red color corresponds to a score of 0, i.e. boundaries are not discernible, while green corresponds to a score of 1, i.e. boundaries are well visible.

We use the resulting probabilities as a soft score for the filtering of geometries and time periods to use in the training dataset.

Super-resolution of inferred probabilities

We have some history in trying to squeeze out of satellite imagery as much information as possible with the use of super-resolution. The conclusion we came to is that it is better to let the FD architecture deal with the super-resolution task, rather than super-resolving the input satellite imagery. We therefore tweaked our deep neural network to include some super-resolution layers, which allow to generate probability images for the extent, boundary and distance to boundary at an enhanced spatial resolution compared to the input image.

Fig. 8 shows examples of super-resolved polygons estimated with a 2x and 4x increase in spatial resolution. A direct benefit of this is that V2 estimates larger parcels than V1, and is also able to provide estimates for smaller parcels.

Fig. 8. Example of estimated parcels for V1 (left), and after applying a 2x improvement of spatial resolution (middle) and a 4x improvement (right). Estimated parcels tend to be larger, and smaller parcels are now detected.

Rectangularization of parcels

We have also been working on an experimental algorithm based on the concave hull to simplify the estimated geometries into more rectangular shapes, to produce polygons more similar to the reference polygons. This feature is still under development and validation, and is therefore not executed by default. Some examples of simplified geometries are shown in Fig. 9.

Fig. 9. Example of rectangularization of parcels. Sentinel-2 L1C imagery (left), delineated V2 parcels (middle) and V2 parcels with applied rectangularization (right).

Automation of execution

The FD codebase in its original form was mostly a research tool, based on Python scripts and Jupyter notebooks. The current V2 version of the algorithm has largely diverged from V1 in several aspects, the most relevant being:

It has been productionalized using eo-grow, which allows to automate the execution and scaling of entire workflows through the use of configuration files and Ray. In conjunction with Sentinel Hub, we are able to automatically execute FD on very large areas in a matter of hours.
We switched to PyTorch as backend for the deep learning framework.

We have therefore decided to archive the open-source code for V1, ensuring though that it still works as intended.

What does V2 look like?

Enough chit-chat! Let’s look at some delineated fields! While the steps described above are valid for many input satellite imagery, we have for the moment developed and validated FD V2 on Sentinel-2 L1C imagery. To evaluate the performance of V2, we have applied it to test locations in Europe, e.g. independent locations not seen at training nor validation phases, and on test locations across the world to assess its generalization. Where reference data was available, we computed geometric errors based on Intersection over Union (IoU), over-segmentation (I.e. whether the estimated parcels tend to be smaller than the reference), and under-segmentation (I.e. whether the estimated parcels tend to be larger than the reference). In Fig. 10, V2b refers to V2 with simplified geometries to make them more rectangular. This process typically results in slightly bigger polygons.

Fig. 10. Results of the quantitative analysis on the test data in Europe for V1, V2 and V2b. IoU values are reported on the left, where 1 is perfect alignment and 0 means no overlap between reference and estimated parcels. Note that we didn’t apply any filter based on the size of the reference polygons, meaning that very small parcels which cannot be well detected with Sentinel-2 imagery are included in the plot. The central and right-most plot show over-segmentation and under-segmentation values, where 0 means perfect overlap.

We have also carried out a performance analysis in terms of false positive, i.e. parcels estimated by the model but not present in the reference dataset, and false negatives, missed parcels not estimated by the model but present in the reference. Results are shown in Fig. 11. F1-scores derived from false positives, false negatives, and true positives are V1=0.51, V2=0.61, and V2b=0.63.

Fig. 11. False positive (FP, in orange), true positive (TP, in blue) and false negative (FN, in green) scores for the test dataset in Europe for V1, V2 and V2b.

We have already run V2 in production, and we have been able to get independent feedback about the performance of V2 for different regions in Italy (not included in the training dataset). Results of the validation are shown in Fig. 12 and were carried out by TerraSystem SRl, Viterbo, Italy. Increase in IoU is in line with what was seen for our test dataset, although absolute values are larger, due to the fact that the reference polygons in the area-of-interest are larger compared to the test dataset.

Fig. 12. Intersection over union results for the independent validation carried out by TerraSystem SRI.

In Fig 13 you will find some examples of V2 predictions in Europe and outside of Europe, i.e. on geographical data not included in the training dataset. You can scroll through these results by visiting our updated demo webpage parcelio. Navigate around the world where red bounding boxes are displayed and zoom in to inspect the V2 delineated results.

Fig. 13. Examples of V2 delineations of agricultural parcels across Europe and the world. Bear in mind that the algorithm was trained on imagery and polygons from the European region, so generalization to other areas might not be yet satisfactory. However, as the results show, the algorithm can adapt well to different soil types, different climates, different parcel shapes, and different time periods, as shown for the two overlapping tiles in Mexico from an acquisition in April and in June.

How to get it?

For the time being, you can get in touch with us at info@sentinel-hub.com or eoresearch@sinergise.com. We plan to update the V1 version on the Euro Data Cube (EDC) marketplace in the very near future.

What does the future hold?

Here is our tentative roadmap going forward:

Continuous improvements of the algorithm in terms of generalization to geographical locations across the world, and quality of results, in terms of estimating disjoint parcels;
Development and validation of FD on Planet Scope imagery, which has a higher spatial and temporal resolution compared to Sentinel-2.

Thanks for reading. Get in touch with us at eoresearch@sinergise.com for any question, comment, appreciation and complaint.