Data exploration of real John Deere planters and harvesters — Part II
In the previous article we explored the seed data. Now we are gonna check the harvesting data.
Harvest Data
You can get sample John Deere machinery data for harvesting at https://github.com/JohnDeere/SampleData
Using QGis we can see a bit of how the shapefile looks:
In this case, the direction of the harvester seems to be from left to right too. It seems like data is collection as a bunch of 12 parallel sensors in line (just like the seeding lines). The acquisition of all 12 sensors are taken at once and at 1 sample/s.
The data for each sensor point is as follows:
The distribution of yield (productivity) data in this sample is
We can see the moisture level distribution too (regarding to humidity):
Looking at a simple altitude plot:
We can see there is a slight lag in acquisition/position timing, just like in seeding data, but it seems to be small.
In yield map, the data seems to be very homogeneous:
We used here the cmap "gnuplot", which has a lot of colors, and limited scale to 300 in order to make slight changes more visual. We can see there is a lot of "near zero" regions.
We did almost the same with moisture map:
In this case, we can see the center region has lower levels of moisture, but the difference if very low (about 1–2%). We don't know why the upper and right "borders" showed lower moisture levels because it doesn't seem to be natural as it is following the machine path, not a natural terrain path. Maybe we have some noise here.
Regarding to the different yield sensors that runs in parallel, it seems like their data is almost identical one another:
Harvesting path
To continue the fun, let's take a look at how the harvest was done
Wow! It was very different from the seed planting pattern…
Conclusions
- The moisture data seems very homogeneous, but when looking closely, differences are high. Maybe we have to cleanup outliers before making more serious analysis.
- We have to study why there are lots of "black dots" in yield data
- There is a problem with acquisition/position time lag to be solved too.