Machine Learning for the Perplexed, Part 1

With the increasingly vast volumes of data generated by enterprises, relying on static rule-based decision systems is no longer competitive; instead, there is an unprecedented opportunity to optimize decisions, and adapt to changing conditions, by leveraging patterns in real-time and historical data.

The very size of the data however makes it impossible for humans to find these patterns, and this has lead to an explosion of industry interest in the field of Machine Learning, which is the science and practice of designing computer algorithms that, broadly speaking, find patterns in large volumes of data. ML is particularly important in digital marketing: understanding how to leverage vast amounts of data about digital audiences and the media they consume can be the difference between success and failure for the world’s largest brands. At MediaMath, where I am the SVP of Data Science, our vision is for every addressable interaction between a marketer and a consumer to be driven by ML optimization against all available, relevant data at that moment, to maximize long-term marketer business outcomes.

In this series of blog posts we will present a very basic, non-technical introduction to Machine Learning. In today’s post we start with a definition of ML in the form of a dialog between you and an ML expert. When we say “you”, we have in mind someone who is not an ML expert or practitioner, but someone who has heard about Machine Learning and is curious to know more.

Can we start at the beginning? What is Machine Learning?

Machine learning is the process by which a computer program improves its performance at a certain task with experience, without being given explicit instructions or rules on what to do.

I see, so you’re saying the program is “learning” to improve its performance.

Yes, and this is why ML is a branch of Artificial Intelligence, since learning is one of the fundamental aspects of intelligence.

When you say “with experience,” what do you mean?

As the program gains “practice” with the task, it gets better over time, much like how we humans learn to get better at tasks with experience. For example an ML program can learn to recognize pictures of cats when shown a sufficiently large number of examples of pictures of “cat” and “not cat”. Or an autonomous driving system learns to navigate roads after being trained by a human on a variety of types of roads. Or a Real-Time-Bidding system can learn to predict users’ propensity to convert (i.e. make a purchase) when exposed to an ad, after observing a large number of historical examples of situations (i.e. combinations of user, contextual, geo, time, site attributes) where users converted or not.

You said “without being given explicit instructions.” Can you expand on that a bit?

Yes that is a very important distinction between an ML program and a program with human-coded rules. As you can see from the above examples, an ML system in general needs to respond to a huge variety of possible situations: e.g., respond “cat” when shown a picture of a cat, or turn the steering wheel in the right direction in respond to the visual input of the road, or compute a probability of conversion when given a combination of features of an ad impression. The sheer variety of number of possible input pictures, or road-conditions, or impression-features is enormous. If we did not have an ML algorithm for these tasks we would need to anticipate all possible inputs and program explicit rules that we hope will be appropriate responses to those inputs.

I still don’t understand why it’s hard to write explicit rules for these tasks. Humans are very good at recognizing cats, so why can’t humans write the rules to recognize a cat?

That’s a great question. It’s true that humans excel at learning certain tasks, for example recognizing cats, or recognizing handwriting, or driving a car. But here’s the paradoxical thing — while we are great at these tasks, the process by which we accomplish these tasks cannot be boiled down to a set of rules, even if we’re allowed to write a huge number of rules. So these are examples of tasks where explicit rules are impossible to write.

On the other hand there are tasks at which humans are not even good at: for example trying to predict which types of users in what contexts will convert when exposed to ads. Marketing folks might have intuition about what conditions lead to more conversions, such as “users visiting my site on Sundays when it’s raining are 10% likely to buy my product”. The problem though is that these intuition-guided rules can be wrong, and incomplete (i.e. do not cover all possible scenarios). The only way to come up with the right rules is to pore through millions of examples of users converting or not, and extract patterns from these, which is precisely what an ML system can do. Such pattern extraction is beyond the capabilities of humans, even though they are great at certain other types of pattern extraction (such as visual or auditory).

I see, so ML is useful in tasks where (a) a response is needed on a huge number of possible inputs, and (b) it’s impossible or impractical to hard-code rules that would perform reasonably well on most inputs. Are there examples where the number of possible inputs is huge, but it’s easy to write hard-coded rules?

Sure: I’ll give you a number, can you tell if it’s even or odd? Now you wouldn’t need an ML program for that!


This post originally appeared on the MediaMath blog. In the next post we will take a first step towards understanding how ML algorithms actually work.