Machine Data — The Automated Operations (Missing?) Link

Granular Engineering
Granular Engineering
7 min readAug 22, 2018

By Tom Burnett

We rolled out initial machine data processing and operations integration back in May 2015, but with the Develop with Deere conference in November, it seems like a good time to say how incredibly awesome machine data is from both a farming operations and big data standpoint. As I’ve dug into the details, I’ve found it fascinating with a history of high tech big data for such an old industry with its own open source software vs. proprietary format wars, and even touching on the future of self-driving cars.

What is machine data?

In simple terms, machine data is a collection of location-based data points that, when taken together, fully represent operations in the field.

Each point has any number of attributes including moisture sensors, vehicle, velocity, planting depth, and application or seeding rate.

Millions of these points makes up one field operation.

This tying together of GPS data and agriculture began in the 1980s and is known as Precision Agriculture. Any data scientist should now be jumping out of their chair… “30 years of data?!” Yes, machine data unlocks rich data analysis if the low level information can be put in context including past weather, irrigation details, and yield information. The future of farming is truly data-driven and if you ask a farmer, it has been for a long time.

Manual Data Entry

I’ve worked on case management systems, content management systems, ERP systems, and lots of things in between and one of the recurring problems is data entry in business systems. There are whole subsystems dedicated to forms, validation, request/response infrastructure for exceptions, validation failures and, of course, we have them. Then there are the business processes those subsystems support like user interface training, customer support, workflow documentation, and, of course, we have those too. What if that wasn’t the future though? What if machine-to-machine was the future? Just like RFID adoption makes physical inventory tracking more manageable, machine data makes farming operations more manageable. The input side of machine data files are prescription files which tell the machine what to apply to the field at what rate at what depth. This closes the loop in regards to needing manual data entry for field operations — with or without self-driving tractors.

The Future, Today

Machine data is great and all, but what do I do with it? At Granular, we’re using PostGISand PostgreSQL Pointcloud (Here is a really good read on the subject). This allows us to quickly extract geospatial-based operational values such as acreage covered, product applied, and fuel consumption across the whole dataset. This is highly efficient with no over or under counting and little possibility of data entry error. Once this operational data pass is complete, we can turn the pointcloud data over to our data science team for further analysis and improve our data extraction / analysis techniques even more. Also, they can do geo-spatial longitudinal studies.

I love machine data! And so do farmers. And so does Granular.

0 NOTES

2 YEARS AGO

How Granular Uses SQLAlchemy for Composable, Maintainable Business Intelligence Queries

By Warren Henning

TL;DR summary: SQLAlchemy is extremely good.

Here at Granular, we use SQLAlchemy for all our object-relational mapping needs. All our APIs are powered by it.

A common pattern for new features in our web UI (as opposed to our mobile clients) is to have some sort of main table view that displays customer information in tabular format, plus a summary or aggregate view of that same data, along with some data entry mechanism, which varies depending on the feature. We generally implement these features in our backend using plain old SQLAlchemy queries.

SQLAlchemy lets us implement the aggregate query in a nice way by letting us reuse all the filter and join statements for the main table. That helps ensure correctness by making sure both queries have the same structure, and it lets us avoid duplicating query code. SQLAlchemy queries are composable Python objects, so we can return a query from a function, or swap out the columns we’re selecting for with aggregate statements (such as sums, counts, weighted averages, etc.) without repeating the entire query. The query is lazily executed, so we can just tack on additional query fragments at will, executing only when the data is actually needed.

Let’s consider a highly simplified hypothetical example of this implementation pattern, without getting into all the details and business logic that go into a typical Granular product feature. (The actual features and data model we work with are much richer.) Suppose we want a page in our application that lists a customer’s crop storage locations and all the storage bins at that location, along with the capacity of each bin, like this:

For the main table, we want to list out each storage container and the location it’s at, along with each storage bin’s capacity, which is just a numerical field. As each bin has a crop type associated with it, we can simply group all storage bins by crop type and add up the capacity. The main table, when fully developed as a realistic product feature, could have many filters and other query clauses applied (which we’re ignoring here for simplicity), as well as simply many more columns to display and manipulate, so how could we implement this as two queries without repeating ourselves?

Well, let’s first write the main table query:

For the main table, we want to list out each storage container and the location it’s at, along with each storage bin’s capacity, which is just a numerical field. As each bin has a crop type associated with it, we can simply group all storage bins by crop type and add up the capacity. The main table, when fully developed as a realistic product feature, could have many filters and other query clauses applied (which we’re ignoring here for simplicity), as well as simply many more columns to display and manipulate, so how could we implement this as two queries without repeating ourselves?

Well, let’s first write the main table query:

# this is generated from request parameters

request_filters = […]

columns = [

Location.name,

StorageContainer.name,

StorageContainer.capacity

]

def location_bin_query(columns):

return session.query(

*columns

).select_from(

StorageContainer

).join(

Location

).filter(

*request_filters

)

main_table_content = location_bin_query(columns).all()

StorageContainer and Location are our SQLAlchemy models for storage bins and storage locations, respectively. We’re selecting only the specific columns we need for simplicity, wrapping the query in a function that lets us pass in whatever columns we want. session is a special SQLAlchemy object that can execute queries against a live database connection; that’s set up elsewhere in the application. The query is not actually executed until we call the all() function, allowing us to add readability to long queries by breaking things up over multiple statements if we wish. Notice how I alluded to dynamically generating a filter list based on request parameters; we can then pass this filter list around, return it from a method, or manipulate it like any other regular Python list, as every part of our query is a concrete, reified Python object, as opposed to a string. This means we can get things like code completion for database columns and ORM relationships even if our editor/IDE is not introspecting the actual database schema. Any Python refactoring tools you have will work on your queries, because SQLAlchemy queries are just regular Python code. Just having all your database interaction be symbolic rather than through strings is a huge win.

Assuming we have that query working correctly, how could we query the data needed to populate the aggregate table that follows? One way would be to just swap our column list for columns that group by the storage container’s crop type, like so:

aggregate_columns = [

StorageContainer.crop_type_id,

func.sum(StorageContainer.capacity)

]

aggregate_query = location_bin_query(aggregate_columns).group_by(

StorageContainer.crop_type_id

)

Our implementation of this query is quite concise, even though it could contain within it considerable complexity, depending on what location_bin_query returns. When executed, the generated SQL should resemble something like:

SELECT

storage_container.crop_type_id,

sum(storage_container.capacity)

JOIN

WHERE

GROUP BY

storage_container.crop_type_id

And all the joins and filters will be the same as those we specified in our main table query, reusing them without having to recapitulate them in our aggregate query. crop_type_id is a foreign key to our list of crop types we support in our application. Before this query’s results are rendered in HTML, some other code will have to look up crop types by this ID field in order to list the crop type’s name properly, and no doubt there will be other minor details to consider, but that’s all easy enough to accomplish; e.g., we could join to the crop type table and select the crop type name, adding that to the list of grouped expressions, as well.

This compositional querying technique demonstrated above is frequently used in our backend, and it’s an important part of how we stay DRY while moving fast and releasing frequently.

If you use relational databases and Python, you owe it to yourself to give SQLAlchemy a try.

--

--

Granular Engineering
Granular Engineering

A place where Granular software developers talk about software. Granular is changing the future of farming by helping farms become more valuable businesses.