How we created the world’s first autopilot for combine harvesters based on video content analysis

Published in

CognitivePilot

12 min readMay 5, 2020

The full suite (provided there is a CAN bus).

Just five years ago, it was impossible to use video content analysis to operate agricultural machinery because there weren’t any fully operational neural networks that could detect the borders of a crop strip or see any obstacles in them. The solutions that were available employed ‘blind’ GPS methods with limited efficiency, which did nothing to improve the reputation of automation in agriculture. Five years from now, however, we predict that all combine harvesters will be equipped with a computer vision-based autopilot capable of controlling every aspect of harvesting crops.

We already have developed ready-to-use technologies that are cost-efficient, have successfully completed a year-long pilot of these technologies and have won the attention of major combine manufacturers. The adoption of this technology will most likely follow the path of car audio units, moving over time from a separate product to a built-in option. We are now modifying old combines, but aim to find our place in the ecosystem and integrate our solution in new combines as they are built.

This project has significant potential for development in countries that have a large agricultural sector, strong domestic developers, inefficient harvesting (and the corresponding pain point of cutting costs), and a pool of new combines. While the market for combines in the U.S. is not nearly as big as the market for cars, for example, it is still significant. And if we make the right decisions now, when there’s no competition, we can hope to capture the entire segment.

The working day of a combine operator using the Agro Pilot.

A combine harvester operator or combine driver arrives at the field at 7 a.m. Unless it is the first day of harvesting, the machine is already out in the field. The driver configures the equipment for the day’s patch of crops and warms up the engine. At around 9 a.m., when the dew has dried off, it’s time to start. The harvesting carries on all day, stopping only at dusk, when humidity spikes again. When it rains, crops can’t be harvested. Farmers have about two weeks to gather all the grain in the fields. If they start too early, the grain isn’t ripe yet; if they wait too long, the grain will be too ripe and fall to the ground. During this peak period, any downtime, mistakes or accidents represent substantial costs to the farmer.

The work shift of a harvester operator or combine driver is up to 14 hours of monotonous work. Here is what it involves:

Driving very carefully and precisely, trying to follow a straight line along the harvested strip. This meticulous labor, which includes constantly looking to the side, is a human driver’s main priority.
Maintaining the right crop-harvesting mode: depending on crop ripeness, height, and density, an operator adjusts 20–22 settings, including the speed of movement within the harvesting area. This means an operator needs to keep an eye on the rotation of the auger all day long.
Controlling the quality of the harvested grain, which means looking back occasionally.
Unloading the grain on time, which means arranging transport.
Moving in coordination with other combines.

In my previous post I’ve explained in detail why operators end up focusing on the driving, use default harvesting settings, and often crash into a transmission tower or a tractor in plain sight in the 10th hour of their shift.

In a nutshell, a human operator can either drive or monitor the harvesting quality. Driving is a challenge: with its 40-foot-wide grain platform, a combine often leaves broad stripes of unharvested crops because of imperfect turns.

If an operator can delegate the driving, however, the crop yield increases dramatically. During our pilot project last year, the yield from the same patch increased by 3% simply because the operator was able to more closely monitor what’s going on in front of him. It increased another 3–5% due to the ability of the harvester to maintain the cut width without leaving unharvested areas. Additionally, there were no accidents.

In other words, the solution is extremely useful, and would have been put in place long ago if it had it been technically possible.

And now it is possible.

Here is what the autopilot does:

It drives the combine in accordance with the specifics of the crop and in relation to previous trajectories.
It allows for the use of a mixed machinery pool of automated and manually operated harvesters (They’re all the same for an autopilot.)
It maintains a precise distance between rows, minimizing the gaps.
It looks for obstacles, categorizes them, and decides whether to drive around them, slow down or warn an operator while there is time.
It maintains the optimal speed for a given situation. For instance, a particular combine model has to move at a speed of around 4 mph for efficient harvesting of wheat. If it exceeds the limit, the crop flow will increase, but the rasp bars won’t thresh all the grains. Grain losses will grow exponentially with an increase of speed: 0.2% at 4.3 mph, 0.5% at 5 mph, and so on.

Agrotechnical assessment of one of the combines based on the results of lab and field testing

The correlation between speed and harvesting efficiency

For the moment, our primary objective is to exclude the human factor from the operator’s work rather than to provide self-driving functionality. Our basic equipment package helps eliminate inefficient use of the header and excessive runs.

What’s under the hood:

A 2 MP camera for mounting on the mirror bracket. The camera is our primary sensor and source of information. There is no need for other sensors.

2. A display in the cab provides the interface for the driver and displays warnings and settings.

3. A control unit with NVIDIA TX2 is mounted under the cab. It contains the main stack of algorithms, processes the video feed, and issues commands to the CAN bus.

Unit characteristics: 13.3x11.4x2.3 in, 40 Wt.

The command module is connected to the CAN bus or another input/output system in the combine. There are a few tricks here: not all models support this kind of connection and operating hydraulic units through our interface is not always possible.

If the combine is less than five years old, a package consisting of a control unit, a camera and a display for the cab is sufficient in most cases.

We might need a wheel rotation sensor if the combine is old or its system bus is incompatible with our solution. This data is necessary for odometry (travel speed and wheel rotation angle):

A dosing pump in the hydraulic system is necessary for our co-pilot to have direct control of the hydraulic units:

The first difficulty with the CAN bus is that clear documentation is not always available. In theory, there are other options for receiving signals from the bus, but we’ve encountered a few of them in our tests. What we normally get is an unidentified system API and a cable port at best. In simple cases, we contact the manufacturer of the port and ask for a description of their protocol. If we’re lucky, two weeks of negotiations are enough for them to figure out what we want and to provide it. Not all manufacturers are forthcoming, but most major players are happy to cooperate because our name rings a bell: last year, we supplied our solutions to several farm units and agricultural complexes and got some buzz in the media.

The manufacturers often ask whether they could integrate our technologies to create a combine of the future. This is another story completely, but this interaction is enough to obtain the protocol and proceed with our work.

In an ideal world, this would be the end of our troubles. Unfortunately, the documentation is sometimes unavailable or the manufacturer is unable to provide a clear protocol description. In this case, we connect to the bus and perform reverse engineering. There is the J1939 protocol, of course, and manufacturers would do well to observe it, but not all of them do. So we use a debugging tool to connect to the system, obtain all the variable packages, and spend all day with the operator while they press the buttons. The combine starts moving, and a zero turns into a positive. It speeds up, and the number grows. It slows down, and the number decreases. Creating the full set of indicators takes a day. Then we need to find the correlation and determine the conversion ratios. Once we encountered a nasty bug that allowed the solution to work in a testing environment but fail in the field. In the ‘peaceful’ mode, the combine was sending a normal set of packages, but as soon as we switched on the header and the reel, it activated ‘combat mode’ and started sending us encoded ‘combat messages’ in the packages we thought we’d identified. We had to be very creative looking for data properties. As it turned out, the header was submitting its data to the same packages in the same variables, but the bus manufacturer had gone out of its way to optimize the process to avoid setting new variables. The developed who did this must be a 256b-intro contest enthusiast in his free time.

Hydraulic systems have had a few surprises in store for us as well. Even with Danfoss for a partner — they offered us all possible assistance and provided us with detailed schemes — we could test everything only by launching the combine. Some of the parts were delayed by customs, so the system only reached the client at the beginning of the harvesting season (and that’s only for one of the cases). As a result, we only had one night to install, launch, and test everything. We arrived at 8 p.m. and started working our magic on the hydraulics. By 4 a.m., we had completed the first installation not knowing whether we’d be able to launch it. If it didn’t work, we’d have to disassemble everything and restore the regular settings. Luckily for us, it worked. The harvester was driving just as it was supposed to. This was the coolest debut we’d ever experienced. My idea of working with neural networks had been different, but I was excited nonetheless.

The package can be extended with a cellular modem for telemetry transmission. In this case, there is no need for GPS, which is a huge advantage. For a GPS steering system to work, you need to prepare a map in advance, install an RTK base station for corrections, or purchase a package of signals, etc. It also requires pressing a lot of buttons in a lot of menus, and combine operators have very little appreciation for user interfaces. What we offer is a camera and a box. As soon as the two are mounted, we’re good to go. There is no need to map the field to divide it into stints. All we need to do is approach the field. The robot says: “Hurray, we’re in the field!” and starts driving.

Why a camera is enough for autopilot

In 2014, we won a grant to create a prototype of an AI-enhanced comprehensive agricultural enterprise management system. We immersed ourselves in the business specifics of agricultural entrepreneurs and identified the most likely opportunities for automation. Farming in Eastern Europe is extremely high-risk. There is only one harvest a year. Cultivating crops is a continuous process: you need to buy expensive crop seeds (the most expensive resource), spray costly chemicals (the second largest item of expenditure), and complete the cycle of soil and crop preparation works. You start in March and don’t rest until the fall. In the fall, you only have two weeks to harvest the crops. If something goes wrong, every day you miss may lead to a loss of 10% of the yield.

If a combine breaks down or a driver gets drunk and crashes the machine or does a poor job of harvesting, precious time is lost — hours or even days.

We started with obstacle detection. Lidar was the most obvious solution, but it’s expensive, so we opted for a camera instead. Since we had a single-camera setup, image recognition was imperative, because seeing an object isn’t enough: an autopilot needs to understand what the obstacle is, how big it is, and what behavior to expect. We either stand still or move forward, but we need to know the relative distance between us and the object and keep in mind the width of the header (24.5–29 feet or even 39–42.6 feet on some popular models). When turning a vehicle of such dimensions, you risk ‘harvesting’ an unfortunate tractor or filler operator taking a leak in the field.

But neural networks work just fine. Since we had a single-camera setup, we managed to trick the algorithm with a 1:40 scale model of a combine a few times (because it’s possible to obtain geometry on the move with a single camera too), but there are hardly any such models in the field.

In a cornfield, for instance, the height of plants reaches 10 feet, so it’s crucial to look for utility poles. Once we segregated crops from obstacles, we could move on to classifying crops.

This is level-two autopilot:

Crops were a major challenge. During our first iterations, we often had to deal with new crops or crops that looked different in reality than in practice. For instance, if the necessary chemicals hadn’t been applied on time, the crop was shorter than usual and had weeds. As a result, the neural network couldn’t recognize it. We once came to a farm for testing and saw a field of barley that looked completely different from the picture in our agronomy handbook due to dramatically different climatic conditions. As a result, our segmentation feature failed to distinguish harvested areas from unharvested ones. We didn’t have enough data for the training set because we were using our own images. So we spent a day in the driver’s cab making a new set of images. At night, the team gathered in the hotel to map the new data and to retrain the network. On the following day, the solution was well-equipped for work in the new conditions.

Here is how target datasets can be added:

According to our sales guys, who interact with agricultural enterprises, people who work there realize full well that improving the harvesting is the main point of the automation. This is where a day feeds a year, so eliminating human-factor errors during this period is their absolute priority. This gets us an opportunity to demonstrate our solution. Before the harvesting season, our service team or that of our dealers comes to an enterprise, installs the equipment, and completes the necessary checks and calibration. Alternatively, we can calibrate the solution shortly before the harvesting, since it only takes a day and a half.

If you’re interested, I’ll be happy to share the particulars of recognition technology for various objects in the field or give some insights about our unique training datasets. Importantly, we are the first team in the world to explore this particular area, so there aren’t any best practices to speak of beyond what we’ve developed.

P. S. If your agronomist doesn’t read Medium but might be interested in our product, you can find our contacts at cognitivepilot.com. There you can also find out what equipment package is required for a specific combine and how much it costs, and arrange a demonstration and testing.

How we created the world’s first autopilot for combine harvesters based on video content analysis

And now it is possible.

What’s under the hood:

Why a camera is enough for autopilot

Written by Andrey Chernogorov