Data exploration of real John Deere planters and harvesters — Part II

Flavio Stutz
3 min readAug 27, 2020

--

In the previous article we explored the seed data. Now we are gonna check the harvesting data.

Harvest Data

You can get sample John Deere machinery data for harvesting at https://github.com/JohnDeere/SampleData

Using QGis we can see a bit of how the shapefile looks:

QGis visualization

In this case, the direction of the harvester seems to be from left to right too. It seems like data is collection as a bunch of 12 parallel sensors in line (just like the seeding lines). The acquisition of all 12 sensors are taken at once and at 1 sample/s.

The data for each sensor point is as follows:

Sample yield data

The distribution of yield (productivity) data in this sample is

Yield histogram

We can see the moisture level distribution too (regarding to humidity):

Moisture histogram

Looking at a simple altitude plot:

Altitude map

We can see there is a slight lag in acquisition/position timing, just like in seeding data, but it seems to be small.

In yield map, the data seems to be very homogeneous:

Yield map

We used here the cmap "gnuplot", which has a lot of colors, and limited scale to 300 in order to make slight changes more visual. We can see there is a lot of "near zero" regions.

We did almost the same with moisture map:

Moisture map

In this case, we can see the center region has lower levels of moisture, but the difference if very low (about 1–2%). We don't know why the upper and right "borders" showed lower moisture levels because it doesn't seem to be natural as it is following the machine path, not a natural terrain path. Maybe we have some noise here.

Regarding to the different yield sensors that runs in parallel, it seems like their data is almost identical one another:

Overlapped histogram of all parallel sensors

Harvesting path

To continue the fun, let's take a look at how the harvest was done

Harvesting path

Wow! It was very different from the seed planting pattern

Conclusions

  • The moisture data seems very homogeneous, but when looking closely, differences are high. Maybe we have to cleanup outliers before making more serious analysis.
  • We have to study why there are lots of "black dots" in yield data
  • There is a problem with acquisition/position time lag to be solved too.

More info

--

--

Flavio Stutz

Systems Engineer, Developer and Architect. Avoiding bullshit jobs for 20 years! See more at https://github.com/flaviostutz