Avocados Declassified
Not to sound pretentious, but I ate avocados before they were cool. A couple years ago I had a group of friends over to celebrate Cinco de Mayo. Why do we celebrate Cinco de Mayo in the US? I like Margaritas and Mexican food, but here’s the real reason. After a few rounds of drinks, it was time to make my famous guacamole. One by one I cut open a pile of avocados, and to my horror, they were all brown and bitter. Fast forward two years and I decided it was time I regain my place on top of the avocado game. After completing Andrew Ng’s 11 week machine learning course, I began applying what I learned to…you guessed it, avocados!

Experiment Setup
What exactly is machine learning, and how does it apply to avocados?
A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. — Tom Mitchell (1988)
Let’s break that down a little.
Experience (E) = Watching me rate the quality of avocados by eating them
Task (T) = The computer predicting the quality of avocados (without eating them)
Performance (P) = How accurately the computer predicts the quality of avocados before I eat them
The result of this data collection and training is a function that looks similar to y = mx + b. It is the computer’s job to “learn” the best values of m and b.
Data Collection/Assumptions
I measured 17 features of the avocados, shown in features.md. A few noteworthy features are the approximation of Volume and Density, and Average Zip Code Income. The formula for calculating the volume of a spheroid requires measuring both radii, so to avoid doing surgery on avocados, I created my own equation to approximate the volume. Using this approximate volume I was able to calculate the approximate density as well. Lastly, I found the Average Zip Code Income to be important because it provided the possibility of identifying food deserts.
The bulk of this project was collecting data, and as a result the majority of my assumptions were based on the relevance, quality, and quantity of the data collected. Having no idea which features would be relevant, I simply chose every aspect of an avocado I could measure as a feature. Additionally, I included the ratio of circumferences because it represents the shape, and I included volume and density in hopes these could predict the size of the avocado’s pit. The downside to choosing many features is I needed to collect a large amount of data to avoid overfitting. The quantity of data needed became a restricting factor because avocados are expensive, and collecting data before meals is inconvenient. So far I’m at a few dozen avocados and counting.

Finally, in order to train my model, I needed a Y value or label. My girlfriend and I scored each avocado on a scale of 1–10 based on taste and appearance of the inside.

The Code
I used basic linear regression to train my model, and this required surprisingly little code. We are trying to form an equation like the one below where y represents the avocado’s score and each x represents one of our features.

The role of this cost function is to calculate the sum of squared errors between predicted avocado ratings (artificial intelligence) and my actual ratings (expert human taste buds).
After calculating the sum of squared errors with the cost function we use gradient descent to adjust our values of theta (or in the case of y = mx + b we would adjust m and b). For those who know calculus, we can accomplish this by calculating the partial derivatives with respect to each theta. Each partial derivative is multiplied by our learning rate alpha, and we adjust all the thetas by that product. For those who don’t know calculus, don’t worry you can steal this equation to get great avocados!
The remaining code is for loading data and feature normalization. Not too exciting, but feel free to checkout the GitHub repo.
Results/Interpretations
Data. Data. Data. The limiting factor during this project was buying and consuming enough avocados to train the algorithm. I will update this post with a more mathematical analysis after eating more avocados, but in the meantime here’s my advice.
- Buy avocados where rich people live!
- Buy big avocados!
- Buy individual not bagged avocados!
Practical application
As much fun as it’s been carefully measuring all my avocados, I prefer to enjoy guacamole without a computer science lesson. Clearly this project didn’t produce many practical applications, so was it all a waste of time? No! While I don’t expect any of my readers to pull out this algorithm the next time they go to the grocery store, similar techniques can be implemented by suppliers, grocery stores, and farmers to ensure they are buying/selling high quality produce. These organizations with huge amounts of produce (and therefore data) are in a perfect position to use machine learning to identify quality produce and save money.
Slogans for My (pretend) Company
“We are making the world a better place by revolutionizing the way people eat avocados.”
“Disrupting the produce section.”
