Making Decisions About The Data

Published in

Empire State of Food

3 min readMay 15, 2019

Quick Recap: Prasenjit and I are students at Cornell Tech doing research on building a model for many farmers with diverse products to collaborate to serve institutional buyers like hospitals or schools. We want to compare local vs. non-local food on price and food miles. Our goal is to show that by buying directly from local farmers buyers can save on both. I used to be a farmer in Upstate New York. Farmers and dining services managers we want your data!

So far our adventure has included Qualitative Interviews, exploring the Ag Census and finding data on the Origin of Every Product. Now we need to make decisions about how we’re putting it all together.

Goals

1. Match buyers and farmers.

For the math part of making matches between farmers and buyers we are going to use bipartite matching. The buyers have preferences over the farmers based on who has the greatest availability of any given product.

We’re building a simulator to figure out how many farms producing how much product do we need to have a reliable enough supply for our buyers. Simulator makes it sound more exciting it’s really just playing with the probabilities for different outcomes and lets us do a lot of different experiments quickly. For example if 10 farms have a 30% crop loss do we still have enough kale? If 5 bigger farms have 15% crop loss do we still have enough kale? Now try that 10,000 times to find the average.

We have a rough outline for the simulator written in python. We need to fine tune the parameters to make our experiments more specific to our situation. Using the USDA school lunch calculator we can figure out the quantities a dining hall would need. We’ll be able to use the Ag census data to ballpark production by county for each product.

2. Compare local to non-local food to show how much money and food miles buyers can save.

For comparing the local and non-local products we are scraping data from Hunts Point Terminal. Scraping data is when you write a program to collect all of the data from an existing source so that it is in a format that can be used to do other things. I found this guide to building your first scraper, it seems like it will be really helpful.

Once we have the data we need to go through each product and do some preprocessing to make it easier to compare between products. For example, some products are listed by weight, some by volume and some by piece. We need everything to be converted into the same units so that the price per unit is what we are comparing between local and non-local products. There are thousands of products and it is not clear if there is a good way to automate this. We’ll probably have to limit the number of products we include at least initially.

Testing with Buyers

Karan, our advisor pointed out that we do need to test the platform with buyers and make some real transactions happen. This is totally true.

The uncertainties we need to test are:

- Will the platform make less work
(or at least not create more work) for dining service managers when they make their orders?

- Can we save the dining halls money buying directly from farmers?

- Will farmers accept prices that save the dining halls money?

- Can we set standards to make a product that is consistent enough across farms for dining halls to use it without creating more work in the kitchen?

Need to do a contextual enquiry to learn more about the process each dining service manager goes through to procure all of their food and supplies. I’m sure there is a big variation based on the size of the dining hall. We need to learn their routine and figure out what their pain points are. I hope we can make ordering a smoother process.

Making Decisions About The Data

Written by Leanna Mulvihill