Dummy Data

Leanna Mulvihill
Empire State of Food
4 min readNov 2, 2019

Quick Recap: Prasenjit and I are students at Cornell Tech building a platform to connect farmers with institutional buyers like hospitals or schools. We want to compare local vs. non-local food on price and food miles. Our goal is to show that by buying directly from local farmers buyers can save on both. I used to be a farmer in Upstate New York.

So far our adventure has included Qualitative Interviews, exploring the Ag Census, and finding data on the Origin of Every Product. We’re building an algorithm that matches farmers and buyers. Setting prices is tricky.

In order to poke around with our algorithms, we had to create dummy data. It’s not really possible to get enough farmers or buyers are going to take the time to give us their data without having a real product for them to use. We can’t really build the product without having data to work with.

Because getting good data from any of these farms/institutions is hard and it is difficult to create a basis for meaningful comparisons. No one keeps records with the end goal of having clean data sets — they really just need to keep their accountants happy at tax time. Which is entirely reasonable, but means that we need to get creative.

Our dummy data for matching buyer and farmers looks like this:

  • 5 buyers
  • 15 farms
  • Annual demand for kale from buyers
  • Annual kale production for farmers
  • Locations for both
  • Fake transaction histories for farmers

I picked real buyers and real farms in the Hudson Valley, so we have real geographic locations happening. But the rest of the numbers are based on educated guesses.

For pounds of kale per year available per farm, half of the farms were given a number based on what my friend Ellie said she was producing. The other half were given a number based on New York State numbers from the Ag Census for acres of kale produced and the yield was taken from the New England Vegetable Management Guide. These numbers aren’t terribly “real” but it does get us to a reasonable ballpark.

For pounds of kale per year ordered per buyer, I made up some more numbers. First I found the USDA school lunch calculator for serving size. Then, I estimated number of people served by the institution. I fudged some numbers. Like, for schools I estimated that the number of people they feed is 1.3 times the number of students. For a retreat center I estimated that their average capacity is 100 people, which is at least within an order of magnitude of the real number. For an assisted living community I estimated the dining hall served 1.3 times the number of apartments at every meal. They’re probably feeding some of their staff and many of the residents in independent living cook for themselves. I gave everyone 240 kale-meals per year. Which is definitely more kale than most people eat. I know I’m biased.

In the Gale-Shapely algorithm, the farmers and buyers are matched based on preferences, not price. The farmers’ preferences are for the closest buyers — so our location data is real. The buyers preferences are based on the farmers’ fit scores. The fit scores are a combination of how consistently you fulfill orders and if you can meet the quantity that the buyer needs — all of this data is fake.

We’re using Hunt’s Point Terminal data and farmer cost of production data as bounds in our price-setting heuristic.

Data that is a step up from the fictional data that I created is Richard Wiswall’s cost of production data from his 2009 book, The Organic Farmer’s Business Handbook. I bought myself a copy and then took the CD-ROM included with the book, found a computer with a CD drive and uploaded the excel files to google drive. It was an adventure.

#throwback

His cost of production data is real, but it’s dated and it is only one man’s data. We’ll adjust his numbers for inflation to get started. But ultimately, I fantasize about paying farmers for their records from a record keeping software like Tend. Having cost of production data for a large set of farms would give us a more robust picture of how much it costs to produce food.

The idea is that if the Empire State of Food was being used for real we would be adjusting our model based on the real data we were getting from farmers and buyers. This is just to get us started.

--

--

Leanna Mulvihill
Empire State of Food

Building tech for farmers at Farm Generations Cooperative. Former owner/operator of Four Legs Farm. Cornell Tech alumni. Loves kale chips and chicken stock.