Machine Learning, When Deconstructed, is Less of An Innovation Than You Think

Tianzhi Li
College Essays

--

— An article that explains machine learning to mathaphobes.

It does not matter whether you know computer science at all. For the past several years, you must have seen the buzzwords “machine learning (ML),” “artificial intelligence (AI),” “deep learning” rampaging across the internet. You may not comprehend how exactly those technologies work, but as long as you understand that they mimic rational behaviors of humans and do some jobs even better than humans, you get the right idea.

If you happen to know the crazy smart things those technologies can do, and if you are awed or scared, you have a reason to be. In 2017, the computer program AlphaGo beat the world’s best player of Go. Go is arguably the most profound and complex board game existing in the world. It was considered unbeatable — yet humans have lost that edge. If you think winning a board game is still no match for other complex human behaviors, you may get a bang in the head.

Recently, Carnegie Mellon University and Facebook AI research scientists wrote a program that plays Texas Hold ’em and beat the world’s top professionals. This means that a computer program can bluff! And it bluffs better than you! Even better, machine learning has become omnipresent in every aspect of our life: face recognition, finance, medicine, economics …. No wonder why all the world’s leading powers are competing to develop artificial intelligence and related technologies.

It seems that I have just provided the best arguments against the title of this article. Just like everybody who is amazed by AI and machine learning technologies, I would have regarded this title as nonsense. I used to believe that if machine learning is that powerful, then every single problem in the world can be approached and solved from a machine learning perspective. How wonderful a belief it is, only that the more I learn about machine learning, the more I realize that such belief is naive and oversimplified.

What is machine learning then? Why may it not be an advanced innovation as most people deem it to be? Let us start demystifying by a story of an alien scientist.

An alien scientist that observes us

A friendly alien was paying a visit to our earth and observing our daily activities. He observed an amazing activity that humans do every day: predicting the weather. To gain a better understanding, he collected daily temperature data from a weather station right on the US-Canadian border. To his surprise, he found out the station was transmitting two different datasets to each side of the border. The data is shown as below:

Even though he was not aware of the concepts of “Celsius” and “Fahrenheit,” he immediately realized that the two countries are using two different measurement systems for temperature. He wished to understand the correlation between the two systems.

A middle school student may be told exactly how to convert between the two systems: “Celsius = (5/9) * (Fahrenheit — 32).” The alien scientist does not know this exact relation. Yet, without googling or asking anybody, he tried to figure out by himself.

To analyze the available data between Sept 16 and Sept 19, he used a technique called linear regression. The details of linear regression will not be elaborated here. The idea is to find the right parameters so that the correlation is as close to reality as possible. All you need to know is that linear regression is among the basics that a high-school student would learn in an intro-level statistics course. After his calculation, he derived

“Celsius = 0.5595 * Fahrenheit — 18.036.”

This formula is not as precise as the previous one. Yet it does the conversion from Fahrenheit to Celsius accurately enough.

What machine learning can do

What the alien scientist has done is the simplest form of machine learning. To solve any form of machine learning problem, it involves (i) Observing and analyzing the past experiences or data; (ii) Making an inference based on step (i).

For example, the Celsius-Fahrenheit problem can be formulated as following:

  • Given: temperature in Fahrenheit
  • Need to find: temperature in Celsius
  • Past data: temperatures in both Fahrenheit and Celsius from Sept. 16 to Sept.19.
  • Exact correlation: Celsius = (5/9) * (Fahrenheit — 32)
  • Inferred prediction: Celsius = 0.5595 * Fahrenheit — 18.036

Simple, right? Is it not an overkill to use something as fancy as machine learning to solve problems that we already know the answer?

What if we encounter really hard questions that we cannot possibly know the answer for sure, but we still wish to have an insight of what the answer might be? For example:

Prediction of stock price

  • Given: stock price of all airline companies on Jan. 1st (today).
  • Need to find: stock price of United Airlines on Jan. 2nd (tomorrow).
  • Past data: Any financial information you can find online.
  • Exact correlation (between “given” and “need to find”): nobody has ever found out.
  • Inferred prediction: ???

Cancer diagnosis

  • Given: an MRI scan containing 200,000 pixels. Each pixel contains numbers that represent the color of that pixel.
  • Need to find: whether the patient has cancer or not.
  • Past data: the patient records for the past 5 years.
  • Exact correlation: not even the best doctor can describe this.
  • Inferred prediction: ???

Texas Hold’em

  • Given: the cards that other players have played, and their sequences of decisions (fold or not)
  • Need to find: what card is the best to play / whether it better to fold or not
  • Past data: Records of games played by professional players for the past 5 years.
  • Exact correlation: not even a professional player can describe this.
  • Inferred prediction: ???

Here is the similarity between the examples listed above: it is impossibly difficult to make a prediction or decision that 100% matches reality based on the knowledge we have. However, by analyzing the past data we already have, it is possible to infer a formula that makes accurate enough prediction.

You may not have an idea what the inferred formula means and why it works, for example, the numbers 0.5595 and 18.036 the alien scientist has derived may not help him understand that human physicists have already defined “Celsius = (5/9) * (Fahrenheit — 32).” Yet, the inferred formula does give a decent result.

In other words, machine learning is a method that implements human’s inference thinking with mathematical and statistical techniques. It does not omnipotently tell us the ground truths, but it does help us to make inferences. Inference thinking is a paradigm of thinking that makes us humans special in the first place: it allows us to make decisions based on experiences without inquiring all the truths of the universe. You definitely know that rain is coming if there are dark clouds and strong winds, and you would not bother to think about the mechanism of how the rain is formed to draw the conclusion “it will be raining.”

What is a machine learning model? Well, I call it choices of tricks.

Notice that I wrote question marks for “inferred prediction” of all 3 previous examples. What do I mean by the question marks? It means that the inferred prediction depends on what machine learning model you end up choosing.

I remember my experience of reading articles in different disciplines and getting bombarded by the word “model,” such as “population model,” “business model,” “financial model,” “machine learning model…” Hardly any of them made sense to me. Therefore, if you turn out not to understand the rest of this section, try not to be frustrated.

To understand what a machine learning model is, let’s think about Lego first. Suppose you are instructing a group of kids to build whatever they want with Lego blocks. John, the first kid, is not interested at all. He just gives you a piece of Arch 1x6x2 and says, “Look, I built a bridge.” Lily, the second kid, builds a pyramid with only 3 types of bricks. Alice, the third kid, is a genius. She uses all types of bricks to build a fighter jet.

Arch 1x6x2, a pyramid, and a fighter jet

All 3 kids have created a Lego model (yes, including John). Yet they have made different choices about two aspects:

  1. What types of building blocks are included.
  2. How to assemble them.

Similar to a Lego model, the architecture of a machine learning model is also determined by types of building blocks and the ways of assembling them. (And yes again, a machine learning model has an architecture.) The building blocks of a machine learning model are mathematical functions, for example, linear function, polynomial function, sigmoid function, sine function, etc. After selecting the building blocks, we need to consider how to connect the functions together. This process feels like building a circuit with math.

In the example of the alien scientist, he only uses a single block of linear function:

Celsius = k * Fahrenheit + b

And he calculates the values k = 0.5595 and b = -18.136 using linear regression. His machine learning model can be illustrated as the following figure.

Machine learning model with a single building block of linear function

Just like building Legos, the process of building a good machine learning model is an art, because the problem solver has to guess what model may work the best for the problem. If the prediction of model fails to match the reality, the problem solver must modify the model. For example, if the alien scientist chooses a single block of quadratic function instead of linear function,

Machine learning model with a single building block of quadratic function

Then he will get the following formula for inference:

Celsius = -0.017 * (Fahrenheit)2 + 3.053 * (Fahrenheit) — 108.85

If you test this on the given temperature data between Sept.16 and Sept.19, the results seem right. But for any other temperatures in Fahrenheit, the prediction will be wrong. And we know that this formula of Celsius-Fahrenheit conversion cannot be further from the truth. As we see, the model with linear function as building block is better than the model with quadratic function.

Let’s hype up a bit with a slightly more complex model.

A simple neural network

This is a simple neural network model in Deep Learning, a subarea of machine learning that is more powerful with analyzing large datasets with more complex models. In a programming assignment of a Deep Learning course on coursera.org taught by Andrew Ng, I tested this model for telling whether a picture contains a cat. The accuracy of this model is up to 80%. Not practically superb, but good enough for a beginner.

Two additional remarks:

  1. After choosing the architecture of a machine learning model, one needs to “train” the model using available data. The process of the alien scientist calculating k = 0.5595, b = -18.136 is the process of training. This article will not elaborate on the training process.
  2. In reality, applications such as face recognition and cancer diagnosis require much more complex machine learning models, such as convolutional neural networks (CNN) and its derivatives. The models introduced in this article are like the Arch 1x6x2 built by John and the pyramid built by Lily. But more advanced models like Alice’s fighter jet are required for more realistic applications.

Innovation or not?

By now the validity of the title of this article is self-evident. When a machine learning model is deconstructed, it is nothing but a bunch of mathematical functions. The properties of the functions used as building blocks have already been well studied by mathematicians over the past few centuries. So, the building blocks themselves are not innovations.

However, if you compare John’s Arch 1x6x2 and Alice’s fighter jet, you will certainly claim that Alice’s work is innovative while John’s work is boring. Just as Alice builds a Lego model with much more complexity, it requires a significant amount of creativity from researchers to build complex machine learning models to beat world champions, diagnose cancers, and solve some of the hardest problems in the world.

In addition to building models, machine learning is innovative in the sense that it has greatly transformed our paradigm of thinking. Data is unprecedentedly valued by society because the more data we have, the more accurately and soundly we can tell stories, predict future, and make decisions. Just like elder people are generally wiser because they have had a lifetime of experience, we are now capable of making valuable inferences by utilizing the intelligence of models that are able to learn from a vast depot of data.

--

--