Data exploration of real John Deere planters and harvesters - Part I

3 min readAug 26, 2020

Today we have tons of sensors in a farm, but unfortunately only a small fraction of that data is stored, processed and combined into more elaborate analysis chains in order to help decision makers combine the various factors that affects the very dynamic and locally intricate characteristics of farming.

In this series we are documenting our first steps into understanding one of the oldest arts in farming: planting and harvesting. Which kind of data comes from current machine sensors? Do they need any preparation before being used?

We are writing these articles as we develop the insights, so come aboard!

Seed data

We got some machinery data from John Deere's API, but you can access almost the same kind of data you get there from https://github.com/JohnDeere/SampleData

Seeding Data comes in shapefile format:

Qgis visualization of downloaded shapefile

In this example, the tractor was running from left to right and right to left. The group of dots clicked in the image above represents a sampling of 16 seeders in parallel, each creating a plant line.

Each point (a single seeder) has the following attributes:

Here we can see the number of seeds released in the field is very homogeneous:

When simply plotting the AppliedRate column of the seeding, we get this:

We have to investigate what causes the lower density of seeds when the machine apparently slows down.

When plotting a histogram of every seeder individually, we can see no lines are seeding more or less then the overall average (no seeder is malfunctioning):

Overlapped image of histograms of each seeder

When plotting this same graph for elevation data, things are different:

Maybe it is due to changing the Seeder ID depending on where the machine is working (the complete seed of this field took 2 days to complete).

Things got strange when plotting elevation data here:

The elevation data in the lines are not matching each other neighbor lines. It seems like there is a lag time between data capture and positioning, cause a zig-zag pattern. Probably applying a lag correction here will fix this by creating a more homogeneous (and realistic) terrain profile.

Seeding Path

Let's have a little fun here. We got the timestamps of each measured point and made an animated drawing of those features to see how the seeding path was performed to understand a little more how the real work goes onboard of the tractors!

Conclusions

For more accurate results, we have to perform interpolation based on swath, distance and heading parameters, just like satellite data is processed for line sensors.
We have to investigate if the lower density of seeding in borders are real or just a noise in sensored data. This will be done in the upcoming works.

More info

https://github.com/flaviostutz/agrometrics/tree/master/notebooks/john-deere-seed-harvest
Now let’s take a look at Harvesting data in Part II

Data exploration of real John Deere planters and harvesters - Part I

Seed data

Seeding Path

Conclusions

More info

Written by Flavio Stutz