Detecting Suspicious Farm Patches Using Machine Learning

Yash Sanghvi
Data Kisaan
Published in
4 min readMar 26, 2021

In a previous blog post, we discussed how, given a farm patch detected by our algorithms, we enhanced it further using its background satellite image. However, we realized that some of the original farm patches were themselves suspicious, which should have been eliminated at the generation step itself. They were in places where they shouldn’t be, like on a road strip, or on a piece of land on which construction was happening as per the satellite image.

So what was happening? Well, our patch detection algorithm uses a number of factors to determine a patch, like the proximity of raw GPS points, some learning-based clustering of speed, pattern identification using neural nets, and so on.

Now, say a tractor made a number of back and forth trips between a sugarcane farm and a sugar factory that are in close proximity to each other. Then the density of the points would lead to the creation of a small farm patch on the road strip.

Example of a patch detected on a road strip because of repeated back and forth travel on the same road

Similarly, say the tractor is parked somewhere for a long time or doing some stationary work, like threshing or digging. The GPS is prone to noise. Sometimes, the noise can become too good to not be heard ;) and over a period of time, the points accumulated in close proximity to the parking location can trick the algorithm into thinking that the tractor did a small farming maneuver there.

Example of a farm patch generated by a stationary tractor

What are the implications? Sub-par user experience is an obvious one. No one wants to see a patch where there is none. There are monetary implications as well for the tractor owners. For instance, the driver compensation is sometimes linked to the number of acres done. Now, these incorrect small patches can affect the overall area numbers shown on the Simha Kit app, and lead to higher cash outflow for the tractor owners, if they don’t check each patch individually.

So how did we tackle this problem? We knew that we somehow needed the server to learn how to detect and eliminate the suspicious patches at the source itself. This statement was itself hinting at Machine Learning. We began experimenting with ML. But unfortunately, all our initial trials gave discouraging results. What we realized was that the obvious features, like farm area, concavity of patch, number of raw GPS points, etc. were not helping differentiate between the proper and suspicious patches.

It was time to bring some method to the madness. So we took a step back and looked at some individual cases, where the trip was incorrect. If you look at the cases described above that led to a false patch, some prominent themes stand out:

  1. These false patches would take a much longer time to form than a proper patch of the same area. We hit gold here! We took the difference between the max and min timestamps of the points within the patch, and divided the patch area with it, and voila! 60% of the false patches were eliminated in one go!
  2. For patches formed near a stationary tractor, the average speed of the tractor in the duration of the patch would be nearly 0, except for some noise. Thus, average speed as a feature also helped us eliminate about 8–10% more patches.
  3. These patches would have irregular and unconventional shapes. For instance, a patch formed on a road strip would be a much thinner rectangle, than a normal farm patch. We, therefore, decided to take the average boundary radius of the patch as another feature. This further helped eliminate about 5–6% more false patches.

We tried various ML models with these features and got really promising results with SVM and Random Forests.

F1 Score Comparisons for different ML Algorithms

In order to make sure that correct patches do not get eliminated by this ML model, we used probability-based classification and flagged a patch as suspicious only when the model gave a >90 % probability of it being suspicious.

The results?

Percentage of false patches flagged as suspicious: 76.8%

Percentage of true patches flagged as suspicious: 0.6%

Percentage reduction in the number of patches that needed to be processed further: 48%

With this one extra step between the patch generation and patch enhancement, we achieved the elimination of >70% bad experiences and recorded thousands of smiles from customers opening up their trips page to see their farming work. We also reduced the processing load on our servers by over 50%. The system really felt much more efficient.

With the introduction of the Profiler, we took this a step further. We profiled users according to their usage patterns, and for users whose probability of using their tractors only for trolley business (no agriculture) was >90%, we simply disabled patch detection. This was because any patch detected for such users was most likely going to be a false patch. This has been working so far and making sure that trolley customers never complain about false patches.

Enjoyed this article? Then be on the lookout for another two weeks later. Till then, for any data-related discussions, feel free to drop us a line at datalabs@carnot.co.in. And don’t forget to follow Data Kisaan on Medium.

--

--