Farm Data as Value Added

I said in my last post that putting good software and data into the hands of farmers can profit a local economy, and I’d like to back up that claim a little. I also hinted that the direction in which such data flows is especially important in an increasingly globalized industry, where in a few years we could reasonably expect to see most of the world’s farming data owned by one or two multinational corporations. In order for farming communities to wrest some control over their own data, I believe it needs to originate with food producers, then travel outwardly along the supply chain to the consumers, who consume that data just as they would the food.

Those are some fairly abstract terms, and I won’t pretend they’re at all unbiased either. But I believe there’s some hard, practical rationale for why farmers can profit from controlling their own data, and I can put it in terms that any farmer with enough business-savvy can understand: data adds value to the product, which means more value for the consumer, which means a better price for the farmer. Let me explain.

The Value of Data

There are a lot of ways farmers already know to add value to the food they grow: washing and bunching vegetables, making a nice display and signage at market, investing in a refrigeration unit for one’s truck, canning sauce from unsold or surplus tomatoes, etc. All these measures avail the farmer a higher price at the point of sale, either by having a fresher, more marketable product when it arrives, or by creating a new market for a product that otherwise might not have sold. Most of this value also ships with the product, meaning the value will be propagated down the supply chain until it reaches its final destination, even if it passes through a few more hands before it does so.

Washing and bunching radishes for market

Reliable farm data can do all these things and more. There are some metrics which the end consumer will obviously value, and farmers can leverage those for a higher price. If a farmer can provide accessible, verifiable data regarding when the product was harvested, how it was grown, what environmental impacts could be measured during its production and the specific seed variety it was grown from, then she will find a large market of customers willing to pay a premium for the food associated with that data. That kind of data can be especially valuable to distributors and other partners further down the supply chain, where normally such information becomes more obscure the further it travels. There’s also the mere ability to verify a product was grown by someone within a small radius of the consumer, assuring them that their dollars are staying in the local economy.

These are all sources of value that traditional methods can capture as well, if not quite with the same granularity and level of persistence; however, I think the real value will be in the predictive power that data can provide, to an extent that other value-added practices can’t really replicate. This comes in especially handy with larger buyers, who have their own downstream markets to be concerned about, and who would truly value knowing a yield within a few percentage points and a harvest window within a matter of a day or two, all from perhaps two weeks out or more. If a farmer has a reasonable expectation that demand will outstrip her yield, and can predict that reliably enough ahead of time, she can offer a guarantee of availability, at a premium, to any interested buyers who order in advance. If instead she anticipates overproduction, she can offer volume discounts to potential early buyers, so that once the harvest date rolls around that abundance will be at a more manageable volume.

Through API’s (Application Programming Interfaces), this data could be propagated to buyers automatically, and prices adjusted the same way. For instance, in the case of overproduction, the sale window could be set to close once a certain number of bushels had been sold. Or if a certain quota isn’t met for premium orders, the price could be lowered after a given time. All this will help guarantee the best price for the highest volume of sales. The availability of this data via API’s also means that, once again, the value can be easily passed down the supply chain. A wholesaler or retailer with an ecommerce business can have the data relayed automatically to their website, and thus to their own customers. A chef can post a new seasonal item on his menu a week or more in advance and start promoting it via social media.

The Protocol vs the Data, Public vs Private

This raises a potentially contentious point regarding what data is made public and what is kept private. This inevitably comes back to who owns the data in the first place, but plain data, technically, cannot be copyrighted or patented, so once someone has access to it, there’s nothing to prevent them from sharing it however they see fit. It’s more a matter of who owns the computer the data is stored on, and what terms of service they’ve negotiated with the user. In that sense, farm data also cannot be made open source in strictly the same way that software or other creative works can.¹

If farm data was made publicly available from one source, and then a third party copies that data, there is nothing holding that third party to make the data or its derivatives freely available via their own platform. This means if a farmer publishes the data related to their wholesale prices and volumes via a free API, a distributor who connects to that API can still restrict access to that data via their own website, even excluding the very same farmer who grew the product. So if that farmer wanted to see what kind of markups that distributor was applying to the original wholesale price, she might have to pay a subscription fee, or perhaps wouldn’t be able to view it at all. At the same time, this doesn’t exactly create any guarantees of transparency for the consumer, who might also be restricted. Perhaps the distributor just wouldn’t be incentivized to pass that data along to the end consumer at all. We can’t just assume that opening up the data will automatically create better markets for the farmer and more transparency for the consumer.

So I’d like to clarify the difference between the need for open standards and public API’s, which I believe are absolutely essential for a transparent food shed, versus the caution which should be exercised when pushing for all farm-related data to be made public. This will take a bit of understanding of how API’s are used to store and transmit data, so we can distinguish them from the data itself. For our purposes, I’ll emphasize that API’s are mainly just the processes by which data is transmitted and stored, not the data itself. It’s the way that one computer application communicates to another.

To use an example of a human interface, rather than an Application Programming Interface, consider the process by which you log into your email account and view your inbox. You go to a URL, perhaps https://mail.gmail.com, then when the login page loads, you click on the field called "Email Address", type your address in, then click on the field called "Password" and type it, and finally press enter or click a button called "Submit". After that you can view your inbox, and access different mail items by a similar series of clicks. The process is the same whether it's your inbox or your friend's or your bosses, even though you lack the credentials to access your boss's Gmail account (presumably). What's most important is the process: the succession of clicks, the names of the fields you enter your credentials into, and the order in which all that happens. You probably don't think about all those steps, but they're all critical to a successful login attempt; if you entered your password into the wrong field, or clicked on the wrong button, you wouldn't get to see your inbox.

API’s have similar protocols for accessing data and authenticating users, but instead of using a series of mouse clicks and keyboard entries, it uses a programming language. Like the human interface, the API is the same no matter who is logging in, and no matter the contents of their inbox. The only exceptions are the actual characters that make up your email and password, because they are themselves a type of data. They’re input data, whereas your inbox is output data. The interface needs to be flexible and generic enough to accept different inputs and respond with different outputs. If it just assumed the input data was the same every time, that the email and password were the same for everyone, then everyone would have the same inbox. This is why it’s important for any API to separate the data from the process. This also means that the process can be made public, while the data, including your password or an email from your significant other, can be kept private. Any programmer can access the Gmail API to write their own email app, but that doesn’t mean they can access your password or inbox. The same can be achieved with farm data, separating public API’s from the private data.

None of this means that there aren’t cases where it would be desirable to make certain farm data public, and perhaps expose that data over an API that is free for anyone to use, with or without credentials. But farmers, as both individuals and as businesses, have a reasonable expectation of privacy over some portion of the their data, just as a Gmail user expects the contents of their inbox will be kept (reasonably) private. Once a farmer puts a product onto the market, there are more compelling reasons to make some of that data public. For instance, it benefits both the farmer and the consumer at that point to have some level of transparency about the growing practices, freshness of the product, price, etc. Public API’s and open data can provide such transparency. Still, I think there is a strong case for leaving trade secrets, personal info, and other types of pre-market data at the discretion of the farmer to publish, whether freely or for a price, so she can thereby leverage that data for the type of value-added services I mentioned above. There needs to be a delicate balance struck between making our local farmers more competitive in a globalized market, while also making that market more transparent for the consumer.

Keeping Data in the Community

I think there is a huge need to talk about how we decide what parts of this data should be made public and what kept private, and I don’t expect to put much of a dent into that discussion here. I will assert, however, that this is something that should be decided at the community level, between farmers and the people eating their food. Unfortunately, that does not appear to be the course we are currently taking.

As is well known by now, the trend among Big Data companies, like Facebook and Google, is to provide users with a nominally “free” or inexpensive service in exchange for the data they’re able to collect from those users. A similar trend is already taking over in digital agriculture, especially as big mergers like the one between Bayer and Monsanto aggregate more and more data into fewer and fewer hands. Angela Huffman, an advocate for anti-monopoly reform in the agriculture, writes in the Des Moines Register that the newly approved conglomerate “will have more in common with Facebook and Cambridge Analytica than meets the eye.” She goes on to say,

In recent years, large agrochemical companies, including Bayer and Monsanto, have been heavily investing in digital agriculture. This new platform involves collecting data from farms, then building mathematical models and algorithms aimed at giving farmers real-time information on how to grow and manage their crops. […] It stands to reason that if Bayer and Monsanto combine to increase their dominance over digital farming, they will use their near monopoly on farmer data to sell more of their chemicals and seeds to farmers.

This new data system, as it’s evolving, favors a dynamic where farmers get cheap services for analyzing their crop data, in exchange for giving away that data to Big Ag, instead of having the chance to leverage that data themselves. These services usually come bundled together with other products, like seeds, fertilizers and even tractors.

In 2015, John Deere & Company told a farmer that he would be breaking the law if he tried to fix his own tractor by accessing the firmware that controlled a faulty sensor. When it failed, that one inconsequential sensor would shut down the entire tractor and halt his farm’s production for two days while he waited for the replacement part to arrive. Cynics and digital rights advocates alike all thought this boiled down to Deere’s agreements with licensed repair shops and parts dealers, but Deere’s rationale for withholding the source code turned out to be something a lot more lucrative, as Cory Doctorow points out (video):

The first thing that happens when a Deere tractor runs around your field is that it does centimeter-accurate soil surveys using the torque sensors in the wheels. And that data is not copyrightable, because facts aren’t copyrightable in America. […] But because the only way you can get access to those facts is by jailbreaking the tractor and removing a thing that protects access to copyrighted works, which is the operating system on the tractor itself, […] it’s a felony to access that data unless you’re John Deere. So John Deere pulls that data in over the wireless network connections in these tractors, and then they bundle it all together and they sell it to a seed company. And if you want to use the centimeter accurate soil surveys of your fields to do automated, optimized seed broadcasting you have to buy seed from the one company² that licenses it. […] But it’s actually just the tip of the iceberg because if you are doing centimeter accurate soil surveys of entire regions you have insight into crop yields way ahead of the futures market. And that’s why John Deere committed PR-suicide by telling the Farmers of America that they didn’t own their tractors, that they were tenant farmers.

This case study in bad agricultural data policy perfectly highlights the concerns we should have for how farm data is used and collected, starting quite literally from the ground up. Before the seed even meets the soil, all of a farmer’s most critical data — which brings with it the power to increase yields, decrease waste and command a better price at market — is being siphoned away to large corporate data stores half a world away. Then the derivatives of that data are sold back to the same farmer who generated it with every pass of his tractor. We can’t blame Big Ag for concocting such a clever scheme to profit its shareholders, but we can learn from it. We can learn to take active measures to restore control of that data back to the local communities where it originated, and with it, return all the value it held.

We as consumers should also have a chance to say how this data is used. Do we want this data to boost the sales of the chemicals which run off into our backyards and are responsible for emitting one third of humanity’s annual contribution of CO2 into the atmosphere? Or do we want to leverage the full potential of farm data to eventually render the extensive farming technologies of last century obsolete, replacing them with smarter, cleaner, more efficient technologies that could save us from environmental catastrophe in this century?

An Alternative Model

Instead of being forced to buy seed from the only company that is licensed to provide services like precision planting, under a different model a farmer could shop around for seed companies that provide other data services. These services would include the ability to import seed data into the farmer’s preferred crop planning software at the time of purchase. Soil surveys could be taken with sensor widgets, which could be installed cheaply on even the oldest, non-computerized tractors and would include their own light-weight, off-grid networking capabilities. This data, combined with the integrated seed data, could be sent to publicly funded university extension programs, who could analyze that data and provide services to help farmers calibrate their machinery to optimize planting.

The benefits of this system over the John Deere model would be numerous. The farmer would get higher yield from each seed planted, which is a tremendous value in its own right, but retrofitting such hardware could also present significant cost savings, compared to the price of a new tractor with computer diagnostics built-in, some of which reach seven figures. Such an arrangement would also provide data to public research institutions, who could be trusted to anonymize and aggregate the data from a wider distribution of growers, and could publish the results of their analysis for others to use. Instead being used to push chemicals, this data could aid research into new intensive growing practices that are better for the environment, and could even help monitor the total soil health of vital growing regions. Plus, if anything ever happened to the sensors, they could be easily repaired by any third party, or by the farmer herself, because they would be built with open source hardware and software.

This data would continue to profit the farmer when she brought the product to market. Instead of giving Bayer/Monsanto the trade insights to hedge on commodities markets, the farmer could use this data herself to get the best price possible. Over time, the data from these soil surveys could be used to train programs that optimize prices, just as the extension’s analytical programs were optimized for planting. The planting data itself could be correlated with data from the National Weather Service to calculate the growing degree days necessary for each crop to reach full maturity. This would have a tremendous pricing advantage if the farmer could start offering more reliable delivery dates. Additional sensors on the farm could make these predictions even more precise, and if it was known that favorable weather conditions in her own micro-climate could bring her crop to market even a few days before other nearby farms, that could provide a real competitive edge.

Again, as I suggested above, forward-thinking distributors and retailers could receive this data and forward it to end consumers. The consumers would be able to anticipate having the spring’s first snap peas or strawberries weeks in advance, and could count down the days on the calendar. They could know when they were harvested and know just how many hours they spent in transit before reaching their table. That transit time could be reduced by innovative software for food hubs, and CSA programs and farmers markets, which could all pull data from such crop planning software and pass along the value.

We shouldn’t be skeptical about the technology itself — that it seems too futuristic or that modern farms wouldn’t have a practical use for it. That technology is certainly coming and will be used. A lot of it has already arrived. We should be skeptical about how that technology will be used, and who it will favor. Ultimately, whoever controls the data will determine where the value of that data flows. Will it all go to a few private interests and controlling shareholders? Or will it benefit the people growing the food, and those who are nourished by that food, as well as the environment that food depends on to grow? That is not a decision the technology will make for us, and it most certainly won’t be an easy one to make or execute. It’s a choice, nevertheless, which we need to make as a community, and it’s one we need to make soon, before others make the decision for us.


¹ There is, of course, a corresponding Open Data movement, which shares a lot in common with Open Source, but in a legal sense it operates entirely differently. Also, there is a myriad of different legal interpretations that I’m glossing over here, but a good primer, if you’re curious, is Feist v. Rural Telephone.

² Doctorow indicates that this seed company is in fact Monsanto, but I have been unable to verify that claim through other sources.


Originally published at jgaehring.com.