How to Cut through the Nonsense when Choosing Machine Learning Solutions

Neil du Toit
QDivision
Published in
5 min readJun 6, 2017

As intelligent data usage establishes itself as a differentiating factor of successful organisations, more and more people are looking to join in on the gold rush. ‘Specialists’ are popping up everywhere, and differentiating substance from nonsense is becoming increasingly difficult for businesses. Fortunately, machine learning has a built-in anti-bullshit defense mechanism.

[This is a more detailed look at a previous article published by Dan Corder]

The number of data scientists on LinkedIn has doubled in the past 4 years. Data scientists are seemly coming from many unrelated disciplines. Computer science majors usually lead the recruiting numbers, but statisticians, mathematicians, neuroscientists, linguists, and postgraduates from pretty much every field are all influencing data science.

Traditional statisticians are also moving to catch the boat. Anecdotes about academics labeling ordinary least squares regression as ‘machine learning’ are common.

Buzzwords are also rapidly proliferating. For example, the Wikipedia page on data mining points out that the term ‘data mining’ is a misnomer, because its goal is not data extraction but processing. The page further cites the book Data Mining: Practical Machine Learning Tools and Techniques with Java as an example of buzzword abuse. The book was originally just named ‘practical machine learning’. Its title was expanded for purely marketing reasons.

As the hot air under machine learning continues to rise, how does your company evaluate concrete solutions? In order to understand this evaluation, it is important to appreciate the conceptual distinctions between traditional statistical methods and machine learning.

But first, very a quick journey through the history of scientific thought…

Falsifiability

In the 19th century, philosophers of science had a problem. The scientific method that they had developed was producing revolutionary results. But the philosophers began to realise that, at its core, there was something deeply flawed with their conception of truth.

The way the inductive method worked at that time was that scientists would make repeated observations of the world, and if particular rules seemed to hold true in all of those observations, then the rule was treated as a scientific fact.

Now imagine a 19th-century turkey that fancies itself a scientist. After almost a year of observing a farmer coming into the barnyard, followed by the turkey getting fed, the turkey concludes that farmers bring food as a rule of nature. Satisfied with itself, the turkey rushes over the next time the farmer comes in, and is promptly slaughtered.

This is why 20th century philosopher Karl Popper proposed a re-thinking of the system. Science, he said, is not about proving things true, but about proving things false. Whenever we have a theory, we should test it rigorously to try and disprove it. If we succeed, then we have learnt something. We have learnt that the theory wasn’t true. If we repeatedly fail, then we still can’t conclude that the theory holds. But we should accept it until further evidence demonstrates otherwise.

As a direct corollary of this system, the only theories that Karl Popper considered to be capable of being ‘scientific truths’, are theories that can be proven wrong. Theories that are incapable of being proven false are also incapable of being scientifically true. This is called ‘falsifiability’.

Importantly, at the heart of falsifiability is prediction. For a theory to be testable through observation alone, it must make predictions. It is only then that those predictions can be empirically tested, and by extension, the theory itself.

Inference versus prediction

There are hundreds of examples on tech Q&A forums of discussions about the actual difference between machine learning and traditional statistics. One of the more humorous examples was the definition of the term “a large research grant”: apparently about $1000 in statistics, and $100 000 in machine learning.

One of the key differences though is the outcome that is emphasised by each of the systems. In statistics, the goal is usually inference. In machine learning, it’s prediction.

Roughly, inference is about estimating population statistics from a sample, such as from a survey. You interview a few people, ask them how they feel about, say, deep learning, and then try to infer how the greater population feels about deep learning.

Statisticians can then go on to make predictions. From their estimates about the population, they can say things like “if you selected someone in this area, I predict that they will believe deep learning is scary”. But, this predictive step always follows the inference.

One obvious drawback with inferential methods is that it’s incredibly difficult to validate whether or not someone has run the numbers properly. If you say that 30% of the population believes X, how do I empirically test that?

I could validate the methods. I could check over their methodology and calculations. But, to empirically check their results, I would have to do my own survey. In my opinion, this is a core reason why the quality of statistics in academic journals is currently so abysmal, and why that often only comes to light when replication studies are done.

Machine learning, by contrast, has very little interest in inferring anything about the world. Inferential results are sometimes produced as a by-product. But as a point of departure, machine learning jumps straight to prediction. It makes predictions and just keeps learning from its errors until its predictions are within acceptable bounds.

So how do you evaluate a machine learning proposal for your company?

Ask them to prove it works.

The reason that I got into machine learning was because of a video I watched of a helicopter flying itself. For a long time, I had believed that machine learning was just hocus pocus nonsense produced by amateur statisticians. But it’s really difficult to try and argue about the appropriate usage of the Wald interval with a helicopter that’s busy flying itself. However it’s made, the stuff clearly works.

And that’s exactly what you should demand from machine learning providers. Some sort of demonstration of success. You would never be able to get a consumer research house to prove the accuracy of their results in this way. But with machine learning, you can.

Of course, this is a bit of an oversimplification. Depending on the business problem, the format of this demonstration might require tweaking. There is a shortage of easily generalisable machine learning algorithms, and success is often difficult to determine in advance.

But be it a minimum viable product, a test case, contingency fee structure, or even just examples of previous work, machine learning solutions should make predictions of some form. And at some stage, those predictions can and should be tested.

Neil is a data strategist at Q Division. In his off-time, he writes more interesting blog posts for his personal blog.

--

--