This Is Machine Learning, Part 2: Supervised Learning

Paolo Perrotta
The Startup
Published in
6 min readMay 4, 2020

--

In the first of this two-posts series, I told you the basic idea behind machine learning. Now let’s get a bit more concrete and talk about a specific approach to machine learning — one that reaped impressive results. It’s called supervised learning. Let’s see how supervised learning solves hairy problems like recognizing images.

To do supervised learning, we need to start from a set of examples, each carrying a label that the computer can learn from. For instance:

As you can see, examples can be a lot of different things: data, text, sound, video, and so on. Also, labels can be just numbers, as in the case of the temperature-to-lemonade converter, or categories in a pre-defined set, as in the case of the dog breed detector. With some imagination, you can come up with many other examples of predicting something from something else.

So, let’s assume that we already put together a collection of labeled examples. Now we can dive into the two phases of supervised learning:

Phase 1: Training. During this phase, we feed the labeled examples to an algorithm that’s designed to spot patterns. For example, let’s say that we’re building a system that recognizes animals. Each input is the picture of an animal, and each label is the species. During the training phase, we show labeled images to the algorithm. The algorithm might notice that all the animal of a certain species have certain common characteristics. This is called the training phase, because the algorithm is looking at the examples over and over, and learning to recognize those patterns.

Phase 2: Prediction. Now that the algorithm knows what each species of animal looks like, we switch to the prediction phase, where we reap the benefits of our work. We show an unlabeled image to the trained algorithm, and the algorithm identifies the animal in it.

As a second example, consider an algorithm that identifies pneumonia in X-ray scans. During the training phase, the algorithm would look at X-ray scans, and notice that all pneumonia scans have certain common characteristics — maybe certain opaque areas — that are missing from non-pneumonia scans. During the prediction phase, it would look at an unlabeled X-ray scan, and tell us whether it contains signs of pneumonia or not.

In the previous post, I told you that machine learning is about a computer “figuring out” data. Supervised learning is an example of that process: in traditional programming, you code a computer to go from the input to the output; in supervised learning, you give examples of the input and the output to the computer, and it gets the hang of how to go from one to the other on its own.

Now that you’ve read this high-level explanation of supervised learning, you might have more questions than you started out with. We said that a supervised learning program “notices common characteristics” in the data and “spots patterns” — but how? Let’s step down one level of abstraction, and see how that magic happens in practice.

To understand the relation between a piece of data and its label, a supervised learning system exploits a mathematical concept — the idea of approximating a function. Let’s see how that idea works, with a concrete example.

Imagine that you have a solar panel on your roof. You’d like a supervised learning system that learns how the solar panel generates energy, and predicts the amount of energy generated at some time in the future.

There are a few variables that impact the solar panel’s output: the time of day, the weather, and so on. The time of day looks like an important variable, so you decide to focus on that one. In true supervised learning fashion, you start by collecting examples of power generated at different times of the day. After a few weeks of random sampling, you get a spreadsheet table that looks like this:

…and so on, for a few dozens of rows. Each row is an example. It includes an input variable (the time of day) and a label (the generated power) — just like in the system that recognizes animals, the picture is the input, and the name of the animal is the label.

If you plot the examples on a chart, such as the one shown here, you can visualize how the time of day relates to the energy produced:

At a glance, our human brains can see that the solar panel doesn’t generate power during the night, and that its power peaks around noon. Lacking the luxury of a brain, a supervised learning system can understand the data by approximating them with a function, as shown in the next chart:

Finding the function that approximates the examples is the hard part of the job — what I called the “training phase”. The prediction phase that follows is easier: the system forgets all about the examples, and uses the function to predict the power generated by the solar panel — for example, on any day at noon, as illustrated in the following chart:

That’s what I meant when I said that supervised learning works by approximating functions. The system receives real-world data that’s generally messy and incomplete. During the training phase, it approximates that complicated data with a relatively simple function. During the prediction phase, it uses that function to predict unknown data.

If you think about it for a minute, you’ll find many possible ways to complicate our example. For one, the output of a solar panel is influenced by other variables besides the time of day, like the cloud cover, or the time of the year. If we collected all those variables, we’d end up with a multidimensional cloud of points that we couldn’t visualize on a chart. Also, in the case of the solar panel, we’re predicting a numerical label. If we’re looking to predict non-numerical labels like the names of animals, then we need to convert those labels into numbers first.

Those additional problems can make supervised learning more complicated, but its basic idea stays the same: take a bunch of examples, and find a function that approximates them.

Modern supervised learning systems are very good at this approximation job. They can approximate complicated relations, like the one between an X-ray scan and a diagnosis. A function that approximates that relation would look maddeningly complicated to us humans, but it’s par for the course for those systems.

And that, in a nutshell, is supervised learning. If you want to delve deeper, there are plenty of resources that you can turn to—including my book and my training on Pluralsight. Cheers!

This posts was adapted from the first chapter of Programming Machine Learning, a zero-to-hero introduction for programmers, from the basics to deep learning. Go here for the eBook, here for the paper book, or come to the forum if you have questions and comments!

--

--