Why Machine Learning Is Like Squeezing Olives

6 min readAug 4, 2015

Machine Learning is like squeezing olives to extract the oil:

If you squeeze it too little, you waste some of the virgin oil, but if you squeeze it too much, the resulting mush is of lower quality. Squeeze it just right though, and you get 100 percent extra virgin olive oil.

In machine learning, over-squeezing is called over-fitting, where your machine learning program starts seeing patterns that aren’t really there. Under-squeezing, on the other hand, is called under-fitting, where your program doesn’t have enough power to extract all the useful patterns that can be extracted from the data.

What is Machine Learning?

A lot of machine learning is basically making programs that can extract patterns from a subset of data that contains previously classified predictions. This pattern extraction step is sometimes called “training the model”. These extracted patterns can then be used on new data to make predictions.

This kind of machine learning is called supervised learning and is more popular than the other kind of machine learning, which, predictably enough, is called unsupervised learning ;).

Turning Clicks Into Dollars

For example, a machine learning program that is to be used to predict whether web viewers will click on an ad is trained on a subset of data that already contains the outcome(whether the ad was clicked or not).

The items of data used to predict whether an ad will be clicked or not are called the features — basically any item of information that can be useful in predicting whether the web viewer will click on it or not.

Features can include the text and placement of the ad, the text and layout of the page where the ad is to be displayed, the identity of the web viewer (including their browsing history) and everything else that might help the program make a good prediction (time of day, network latency, etc.)

Once a machine learning program gets good at predicting whether an ad will be clicked or not, it can then be used to choose which ad to show on which website, when and to who(m).

As an aside, the massive billion dollar revenues of Google are almost all based on Adwords, which contains at its core a machine learning program that is very good at predicting which ads you will click on as you view Google’s search results.

For a revenue model that is based on getting paid a few cents every time an ad is clicked, getting good at predicting which ads have a better chance of being clicked on has been crucial in turning Google into what it is today.

The Art and Science Of Making Good Predictions

The goal of the machine learning scientist or engineer is to tune the machine learning program so that it squeezes the patterns out of the training data just right — extracting all the good quality olive oil but leaving the mush out of the extracted oil — in this case, extracting enough of the patterns that are useful in predicting results from data it hasn’t seen yet. This is the objective of machine learning scientists and engineers building systems that are good at making predictions.

In the end though, however good the tools you use to extract the oil from the olives, the quality of the oil extracted is ultimately limited by the quality of the harvested olives themselves.

So it is in machine learning: how good you get at predicting is limited by the quality of the data used to build the model that your program uses for prediction.

Also, having more olives allows you to extract more of the premium olive oil, so having more data allows you to make better predictions, given the same set of tools.

To improve our prediction capabilities, it’s not just about getting enough data, but also finding the right kind of data that is useful in making predictions. It seems that a lot of the work that machine learning scientists and engineers do is to search for those kinds of data that are useful in making good predictions. In machine learning parlance, these tasks are known as feature generation, feature engineering, and feature selection.

While there are some tools to help the scientists and engineers search for the kinds of data that are useful, it’s still pretty hard work — involving a lot of iterations between guessing what features are useful and trying things out — leaving the science of making good predictions still somewhat of an art.

The Data Feeding Frenzy

On the other hand, recent developments in machine learning tools and technologies are not only making it easier to collect massive amounts of different kinds of data (and therefore making it easier to figure out which kinds of data are useful) — it turns out that just collecting more data (and also having the right tools to process those massive amounts of data collected) can lead to better and better predictions.

Learning To Find Cats

Until around 2011, identifying cats in pictures and understanding spoken words in noisy environments (for example, over the phone) have previously been hard problems for even the best machine learning systems.

But with the advent of new techniques such as deep learning, and equipped with massive amounts of computing power, even these tasks have become tractable for companies like Google, given enough pictures of cats and voice samples.

This is one reason why companies like Facebook and Google are getting nosier about who we are and what we do — they have begun using these new and advanced machine learning tools to better predict how we act and respond as we interact with their websites and apps.

Being able to predict our behavior better can then be translated into products and services that they can then turn into bigger revenue.

What Simple Machine Learning Can Do

It still astonishes me what even simple machine learning models can do.

In an in-class prediction competition held as part of an online machine course at edX offered by MIT, I was able to build a machine learning program that could correctly predict around 88 percent of the time if an eBay listing for an iPad would be successful or not.

I’m not sure if this rate would hold up once it encounters the “real world”, but even if its success rate drops to 75 percent, that still means it could correctly predict three times out of four if an eBay listing for an iPad would sell or not.

Even looking manually at the data, I could not predict it that well, but a program I wrote could predict it better than me. Not only that, it told me what factors seem to be important in determining whether it would sell or not.

For those who have been doing machine learning a long time these results might seem ordinary or even trite, but I would bet an average salesman’s eyes would boggle at the idea that a program can guess 3 times out of 4 whether they can sell an item or not.

Machine Learning For The Masses

Machine Learning still has a long ways to go in terms of adoption by ordinary businesses, but its pretty clear it provides a massive market advantage for those companies already investing in it.

Even more ways to go is the awareness of how much machine learning will affect us, the ordinary consumers of these “free” services as their providers learn more about us and become better at predicting how we will act and respond to what they offer.

I hope this short introduction on machine learning helps illuminate those of us who are unaware of the opportunities and the pitfalls that lie before us as machine learning applications will undoubtedly grow in the years to come.

If you enjoyed this article, it would mean so much to me if you could click on the recommend button. Thanks!