Area Monitoring — High-Level Concept

Published in

Sentinel Hub Blog

15 min readSep 1, 2020

We are building a generic service for a Common Agriculture Policy market, that has been from the very beginning, for decades, served by individually tailored solutions, usually by local vendors. Note to the readers — it’s a very long read, combining business aspects as well as technical findings.

An example of the analysis of a specific agriculture parcel, delivered in a generic manner. If interested, read on!

This is a multi-part series. Find information about related blog posts at the bottom.

Introduction

The requirement to monitor and control the significant Common Agriculture Policy (CAP) subsidies made annually for millions of agricultural parcels over all European Member States (MS) has driven the technological advancement and adoption of all existing and emerging geospatial technology, including remote sensing, more than any other across Europe over the last three decades. Hence, the advancement and free operational availability of Copernicus Sentinels demanded that their use also be seriously considered in bringing cost-savings and operational efficiencies in the controls process.

Two of the most demanding tasks of the CAP include On-The-Spot Checks (OTSC) — which requires inspectors going to the field and checking the situation with their own eyes — and Control with Remote Sensing (CWRS), where commercially available high spatial resolution recent satellite imagery is used to check the fields’ cultivation remotely, going to the field only in the case of uncertainty or ambiguity. Both of these are costly, requiring tasking and purchasing expensive satellite imagery, supported by significant trained man-power and travel to distant parts of the country in tight timeframes. Even though these controls are required for “only” 5% of the farms, they still require millions of EUR every year.

The immediate initial anticipation was that freely-available, almost continuous intra-seasonal imagery would easily replace the expensive and cumbersome OTSC and CWRS controls. However, the complexity of the task and the low resolution, noisiness and volume of the Sentinel data is requiring significant research and development of advanced data handling and machine learning (ML) before we will be able to reach an optimum roll-out, and highest value-add of Copernicus Sentinels into the CAP process. The expectation of perfection did, at least in our belief, delayed the uptake of Sentinel data at least to some parts of the processes.

A typical NDVI profile of several crop types (left, source), derived from a real-life situation for four years in Küçükyıldız, Turkey (right) — a challenge for you, match these two!

The Challenges

On-the-spot Checks are fundamentally different than what one can do with Sentinel and ML. First of all, they are tied to very detailed and accurate observations, often measuring the boundary of the parcel to a few percent (and sub-meter) accuracy. It is therefore difficult to replace them with Sentinel’s 10-meter resolution (or 20-meter in case of Sentinel-1). The second issue is probably even more important — existing controls are designed to be 100% correct and repeatable (the assumption that has not been thoroughly tested for quite a while). In case external audit found discrepancies in the controls process, they were usually tied to significant financial penalties for the country’s paying agency. For ML techniques, however, it is, almost by definition, impossible to implement 100% correct and fully repeatable process. They are, after all, based on statistical approach.

Hence, five years after the operational availability of Sentinel-2, and its data are still only barely used in CAP. However, the European Commission (EC) in recognizing these challenges acknowledged the greater potential for large-scale monitoring performance instead of risk-based sample checks on compliance.

The statistical categorisation of ML-based classification results of massive data sets are more analogous to the so-called “administrative controls” rather than the precise and definitive OTSCs. These are a set of business rules and queries, which sift through the data of all submitted subsidy requests to identify any outliers. For example, “Does the total area of requested grants fit the total area of parcels under control of the farmer?”, “Did the farmer apply for similar subsidy in the last years”, etc. These controls have to be applied to all claimed parcels of all applications. They were initially performed using elaborate querying of Excel spreadsheets, later enhanced with automated control and efficient software tools supporting 100% check by operators, e.g. answering the question “Does the latest aerial imagery still reflect the parcel borders and use?”. This is where blanket coverage of Sentinel-assisted analysis should fit much better, especially with the overall objective to move from 5% checks sample to monitoring of 100% of the area.

The ingenuity of this “Checks by Monitoring (CBM)” system lies in processing the vast amount of full-area, multi-temporal indirect “signals” from Sentinel data and translating them into interpretable “markers” and “scenarios”, derived over “features of interest” [1]. This process fits very well with the overall administrative check methodology, introducing new criteria (“lanes” and “rules”) under which to evaluate whether a potential application should be flagged as “problematic”. An automated process may be tempting, but it is likely that advanced deep-leaning techniques may blur the full complexities and uncertainties, so a hybrid approach where automated ML-based routines can support a manual conclusion may be optimal at the current stage. In reality, there will be many ML methods used and a majority of the work will be automatic, but it is important to bring these in where needed rather than the other way around, introducing fancy deep learning methods just because one can.

The additional data we gain from Sentinel signals can provide many answers, but not all — there are limitations for small parcels, for parcels in cloudy parts of the world as well as for measures, that simply cannot be observed with such data. Whatever answer one gets, it needs to be taken it into account; when there is none, an alternative way to check specific measure has to be found, or a decision made to leave it out due to insignificant impact on the budget. For the ones remaining, there is always a “back-up plan” — to perform old-fashioned Controls on 5% of the (remaining) sample. The “Checks by monitoring” process therefore simply reduces the burden by confirming (significant) part of the measures and allow the Agency to focus on things that cannot be done in any other way.

Field structure in France (left, source: Valtzen via Twitter) and Saudi Arabia (right, source: Harel Dan’s script via Twitter)

Another self-inflicted problem that we have all stumbled upon is the “crop type classification”, which was more or less the first step of just about anyone in this field. This is probably due to already existing methodologies and techniques for supervised learning, which fit well to the problem. Plus, end result is an appealing colourful map, which is perfect to impress managers. The concept is far from ideal though.

First, typical CAP measures usually ask farmers to declare one of hundreds of crop types, based on local specifics (e.g. inter-species varieties) of the measure or to have good statistics on the country level. The specific crop type varieties cannot be distinguished by remote sensing (some cannot even be discriminated by field control), and provide a mixed picture which confuses the ML. Hence crop groups have to be collated, which is not trivial as one would want grouping to respect the monetary specifics of the measures as well. Then there is the case of subsequent crops in the same year, which happens in some parcels, but not others, making it more challenging to detect “specific crop type”

There is an even more important, fundamental, challenge in crop classification — while the typically used ML methods are very good at assigning a specific set of observations (“signal”) into “most similar class” with associated probability classes, etc, they are not designed to find “does not belong to any class” answer. This makes it close to impossible to determine “illegible areas” or to flag potentially problematic measures.

Last but not least, crop type classification is very complex, conflating ML black-box algorithms, vegetation growth cycles, human activities such as seeding, ploughing and mowing, local climate conditions, agriculture practices and many more. This complexity makes it very challenging to understand the results (and their limitations), improve them and, importantly, translate them into actionable decisions.

Proposed Approach for the Markers

Our most encouraging results so far have come from developing simple, automated, hierarchical, interpretable and effective markers:

Similarity — this is based on the assumption that all claims for a crop of a certain group, in the local neighbourhood (say within 20km and a similar altitude) should produce a similar signal (e.g. behaviour of vegetation index in a time-series). All non-border pixels in all parcels of that crop type can be extracted and compared and any deviation from similarity may be attributed to for example, a wrong claim, different farming practice, quality of soil (water availability), etc. (more info).
Homogeneity — similar to the above, but inter-field, the marker is based on the assumption that all pixels in a field with one crop, should have a similar spatial response, for small variability due to different quality of soil. All non-border pixels in each parcel of that crop type can be extracted and compared, and any different spatial behaviour between pixels may indicate different inter-field soil/water conditions, a multi-crop parcel (potentially requiring split of the parcel to get a good quality of other markers) or an illegible area (e.g. abandoned land, overgrowth, built-up areas).
Bare Soil — sudden drops in a vegetation index, or non-vegetation radiometry may indicate bare soil — an important occurrence in agriculture activity, which can act as a proxy for ploughing. This marker can have repeated occurrence over the year. One should also consider the vegetation activity, as well as the time period between two events to deduce whether they are two separate events or only one (more info).
Mowing and Harvest — very similar markers in terms of vegetation activity, mostly differentiated based on the crop type (more info).

After the above markers are obtained, more elaborate include:

Land Cover — normally only three land cover types are declared in subsidy claims — arable land, grassland and permanent crops. However, if training is made on all possible land cover types, the model can be used to detect non-agricultural areas, which should prove to be, in most cases, ineligible.
Crop Type — even though it has difficulties (it is less accurate than the declarations themselves, can detect main crops only and even these later on in the season when it might be too late for reaction), this marker is still very useful in combination with the others, especially for more complex measures (more info).
Minimum Agricultural Activity — based on a standard supervised learning method using ground truth from on-the-spot controls and LPIS QA exercises. By training the algorithm to detect areas similar to the ones identified by inspectors, another input for filtering out problematic cases can be obtained. A big advantage of this marker is that if actual agriculture activity can be confirmed, a significant part of the base payment can be approved.
Field segmentation or “parcel boundary detection” — parcels’ polygons are often taken for granted, even though they represent an essential input into the CBM process, both technically as well as from methodology point of view. While the coarse resolution of Sentinels does not permit the automatic update of LPIS and GSAA-derived parcel boundaries (which means that the current three-year update cycle based on orthophotos will have to remain), automatic segmentation methods based on multi-temporal and multi-spectral responses may assist in detecting boundaries, to serve as a supplementary layer or input to update procedures, especially when combined with homogeneity and land cover markers.

An important property of these markers is that they are fairly simple, require little additional input and can easily be generalized across a large region. Some of them address specific rules directly, and some of them can be used in combination — e.g. to detect ineligible areas, which do not have their own marker. The input required is manageable. Apart from the satellite data, most require only polygon boundaries, some also land cover and/or crop type classifications — data that are readily available within paying agencies and that should, according to EU legislation, also be publicly accessible. Note that it is not essential that each marker separately is perfect. It is a combination of them, which will even out inherent errors that cannot be avoided.

Pixel- vs. Object-based Processing

Most markers are based on satellite imagery, where land is represented by a pixel of a specific size. ML methods allow us to detect properties of each individual pixel (sometimes even to more details, using super-resolution techniques). Apart from being the most straight-forward technological approach, this can also be perceived as the preferred method since it produces results with greater detail. However, this method also has major disadvantages.

More data also means more noise. Results might be more detailed, but they will be less consistent overall. There is also a technical part — a parcel of one hectare consists of 100 pixels. If we replace these 100 pixels with one node of area-averaged data, we reduce the complexity of the processing by orders of magnitude. This does not just reduce the overall costs, but, more importantly, it significantly shortens the time for analysis, making it possible to iterate faster. Statistically aggregated data is usually also cleaner, producing better results.

The preferred approach is therefore to develop models mostly on the object level. In many cases such models are directly usable on pixel basis as well, so the best of both worlds can be obtained. For example, pixel-based land cover classification could be performed to enable detection of non-conformities within selected/prioritised parcels.

Burning crops (source: Twitter by Valtzen)

From Markers to Decisions

The translation from markers and signals is defined as “scenarios” and “lanes” in JRC’s methodology. The more commonly used term by paying agencies meaning the same is “business rules”. Markers can be used in exactly the same manner as business rule evaluation, for example, assessing the rule “Did the farm holder have an irregularity identified in the past”.

The challenge is identifying the markers and corresponding criteria which can confirm with sufficient accuracy whether or not a specific measure is appropriate for a specific parcel or a farm. Due to the inherently uncertain nature of probabilities in statistics, as well as severely negative impacts of someone being wrongly accused of irregularities, an approach of “searching for the good” rather than “finding the bad” is preferable. Scenarios are designed which sift through all the applied measures and confirm, with a certain level of probability, that a measure is OK. In this way, the vast majority of the measures that were confirmed OK can be filtered out, and only a small percentage will be remaining, for which another solution has to be found.

There can be several reasons why a measure may not be confirmed and each of these has a different approach:

Small Parcels — JRC’s recommendation states that one should establish an internal buffer of 10 meters and then have at least 5 full pixels remaining to avoid border effects and assure representative statistics. These conditions cause significant problems in countries with small parcels, especially narrow ones. As these recommendations are not set in stone, the buffer can be reduced, even to zero, using the pixels which are fully inside the polygon (using the satellite’s source coordinate system to prevent reprojection errors). Or even take those, that overlap to a significant extent, but are also crossing the neighbouring parcels of a similar crop. Also, instead of requiring five remaining pixels, one can settle with just one. All of these decisions will reduce the accuracy of the finding but the impact will probably go to the “negative side” — due to mixture of classes or statistical error, there is a significant chance that a marker will not confirm the measure. However, as the preferred approach is to find “positive cases”, this approach is still valid as it will reduce the number of remaining samples. In case the parcels are still too small, VHR data such as PlanetScope may have to be used. These come with additional cost, but if used smartly, the cost can be manageable, e.g. less than 1 EUR per (small) parcel per year. If the data for the remaining necessary ones are requested, the cost should be manageable and justifiable.

Comparison of NDVI time-series of Sentinel-2 (green) and PlanetScope (orange) for a set of small parcels.

Cloudy Areas — based on our experience, accurate detection of clouds (and cloud shadows) is more important than the sheer number of observations. However, in cases where there are too few of these, Sentinel-1 SAR could be used as a substitute as a source dataset for many markers. Daily revisit time of PlanetScope can also be helpful.
Low Degree of Accuracy of Results — for specific combinations of crop types and markers, especially in the early stage of technology development, low degrees of accuracy are bound to occur. However, this should not be a blocker. Starting with an imperfect system and improve it later will bring immediate benefits. When IACS was starting, Excel spreadsheets were also far from perfect, but they did the job. And then they evolved.

Expert Judgement Process

As already mentioned, a ML process can perform many steps and decisions. It is however expected, that thousands or tens of thousands of cases will remain to be checked “by hand”, as they simply do not fit the ML model to the appropriate degree of accuracy. However, with a well-integrated Expert judgement application, an operator can process each of these cases in a matter of seconds. The application should be perfectly fine-tuned for the process to ensure high efficiency — all the relevant data (including signals and markers) have to be at the operator’s disposal, with the most relevant one already at the forefront. The operator should be able, in most cases, to look at various maps, time-series charts and make a yes/no/can’t tell decision almost immediately. Only in the most complex (unusual) cases, should it require visual examination of available data. Decisions made by operators should be then fed back to the ML process to improve the accuracy for the next iteration.

A perfect solution is not possible now, but a combination of automated ML-based techniques with an efficient expert-based, manual, judgement process, which can feedback and re-train ML-bases, should yield iterative, exponential improvements.

Optimization of the IT Process

A well-designed IT system, which integrates various processing steps and options in a form of micro-services is an essential tool so that the complete chain can be fully automatic and the steps are not duplicated. There should be no manual step required in the processing chain to prevent delays and to ensure fast iteration by being able to reproduce results immediately.

Many of the steps described above can be performed remotely and generically and it should be possible to evolve to ordering marker services, in much the same was as one can order VHR data. In most cases, these will not require extensive customization. An indispensable input is a layer of claims (polygons, crops, measures); the rest can often be derived directly from the data. The best approach for using such solutions are “microservices” in forms of APIs, which can be easily integrated into PA’s existing tools.

Selection of Measures

Last but not least, is a consideration of the selection of measures. These have usually evolved through years and years of policy decisions, often without the relevant link to “situation on the ground”. When progressing towards an Area Monitoring process it should be accepted that it will simply not be possible to check all the measures using satellite data. Some will therefore have to be checked differently (ideally from other operational procedures; e.g. permanent vineyards — difficult to observe with 10-meter Sentinel imagery — are usually mandated to report their operation in various registers). For some, decisions to drop them may be needed because they present unpragmatic barriers to a simplified process.