Machine Learning, the A.I. revolution, explained

(For a summary, just read the titles and quotes).

Extending human intelligence

When Steve Jobs was a kid, he read a study that measured the efficiency of locomotion for various species. That is, the researchers wanted to find out which animal uses less energy for traveling 1 kilometer. The condor was the winner of this analysis whereas humans’ performance was rather unimpressive, appearing about a third of the way down the list. But then someone had the idea of testing the efficiency of the locomotion of a man on a bicycle. And it turned out that the man on a bicycle blew away the condor, performing with an efficiency completely off the top of the charts. Jobs realized that what makes humans different from other animals is that we are able to build tools that make us better.

Years later, when he started working with computers, Jobs made one of his famous quotes: “Computers are like a bicycle for the mind”. Today, forty years after Apple was founded, computers are indeed dramatically extending the capacity of our brains.

Let me very naively classify the functions of the brain into two groups: memory (i.e., storing information) and intelligence (i.e. doing / computing stuff) [1]. Regarding memory, computers are greatly extending our brains by allowing us to store essentially an unlimited amount of data. The problem that needs to be solved is developing faster methods of storing and retrieving information from such external memory. Computer science has also achieved a great extension of our intelligence. For example, we have mathematical software that allows us to rapidly perform complex calculations.

However, traditional computer science [2] faces a major limitation regarding the task of extending human intelligence: we first need to explain to the computer how to perform the task we want to accomplish. For example, to create a mathematical software, we first must write a program that explains to the computer how to do each mathematical operation. Once the program is finished and installed on a computer, then it can perform those operations much faster than we can. Therefore, computers can only achieve tasks that we can explain how to perform.

Programming people and computers

You can think about programming a computer the same way you would think about teaching a task to a human. For example, you could tell a friend something like, “If the oven’s timer goes off, then switch off the oven by pressing this button”. By doing that, you have “programmed” this person to do something you needed. Programming a computer to do this task would be pretty similar to programming a human. The code could be something like:

if oven.timer_alarm == on:
button.status = pressed

It is just a matter of language. You program people in the English language, whereas you program computers using some programming language.

However, what if the task we want to program is a more complex one? Or what if we need to program a task but we don’t even know how such a task is done?

Let’s think about when we teach a kid how to identify different kinds of animals. Unlike the example of the oven, you cannot start describing the characteristics of every animal like: “If the animal is within this range of colors and has black vertical stripes with a slightly elliptical shape and has a nose like… then it is a tiger”. Can you imagine using that teaching strategy with kids? It would be impossible and would take forever. In most cases you wouldn’t even be sure which features of each animal you are using to identify it. Instead, what we do is to show pictures of animals to children together with some specific tips, and this way they unconsciously learn what features are those that identify each animal.

Well, it turns out that if you want to program a computer to identify animals, the only way to do so is to use a programming language to manually describe each animal so that the computer can tell the difference between one animal and another. For the same reasons that you cannot teach children to identify animals by thoroughly describing them, this endeavor is obviously doomed to failure.

The need to make a program that explains to computers how to perform each task is the great limitation faced by traditional computer science programming. It has prevented computers from further extending our intelligence to solve more complex tasks. To truly extend our intelligence, we need computers to accomplish tasks that we don’t even know how to do.

The Artificial Intelligence revolution

Here is where Machine Learning comes to the rescue. Machine Learning is the field that studies how to make computers learn. In other words, a Machine Learning algorithm is a computer program that teaches computers how to program themselves so that we don’t have to explicitly describe how to perform the task we want to achieve. The information that a Machine Learning algorithm needs in order to write its own program to solve a particular task is a set of known examples.

For example, for the task of teaching a computer to identify animals, we will show to the computer a bunch of labeled pictures (e.g. this picture is a tiger, this pictures is a cat, etc.), the same way we do it when we teach children. The Machine Learning algorithm will use these samples to identify which are the features that differentiate one animal from another, and with this information it will write its own program to perform the task of identifying animals [3]. Now you can see how enabling computers to learn and enabling computers to write their own code are the same thing (if you want to read a brief explanation about how Machine Learning algorithms work, take a look at [4]).

Therefore, Machine Learning is the way to make computers learn how to perform complex tasks whose processes cannot be easily described by humans, or even tasks that we don’t know how to accomplish (e.g. “I want to calculate how many customers would buy this product” or “I want to make this photo look like a Picasso painting”).

Turning photos into paintings.

We usually deem “prediction” to be the action of computing the most likely outcome of a very complex process that humans can hardly compute, and that is why we usually say that Machine Learning models are used to make predictions.

Many useful Machine Learning algorithms were already known twenty years ago, but just recently we obtained enough computing power combined with lots of data to make them work. Computers are not yet very good at learning very human-specific tasks, such as writing and reading texts or identifying objects. However, due to their high computing power, computers are much better than we are at identifying patterns in large amounts of data.

Imagine the following series:

1: [0, 0]
2: [2, 3]
3: [4, 6]
4: [6, 9]

100: [198, 297]

For us it is pretty easy to see the pattern in that series and to accomplish the task of predicting the next row. However imagine a series in which each row is composed of thousands of numbers whose values are calculated by combining multiple values of the previous rows. It would be rather impossible for us to find the patterns and predict the next row. Of course, it would be also be impossible for us to program someone or something to do it, simply because we don’t know how it is done! However, a computer running a Machine Learning algorithm could learn to do it within minutes.

This is the reason why Machine Learning is already extremely useful in helping humans perform complex tasks, such as predicting diseases, predicting stock market evolution, self-driving cars, and an infinite number of other applications.

In fact, anything that can be recorded is something that can be predicted. Therefore it is now your turn to think about how many databases you are dealing with that are not being used to make predictions and how such a window into the future will provide your business with an enormous competitive advantage.

[1] This is, of course, a simplistic classification. Defining the functions of the brain and especially the concept of intelligence is a very interesting topic that I will try to tackle in future posts.

[2] Of course Machine Learning has been a part of computer science for decades. By “traditional computer science” I mean traditional computer programming, which has been and is still the most common way of making a computer perform a task for us.

[3] This technique of learning by example is the most common one in Machine Learning. It is called supervised learning. Other popular learning techniques are unsupervised learning and reinforcement learning. In unsupervised learning we train computers using unlabeled data (e.g. learn to group similar patients) whereas in reinforcement learning computers learn by trial and error (e.g. learn to play a game). We will talk about those in future posts.

[4] As you already know, instead of making programs that explain to computers how to perform specific tasks, in Machine Learning we make programs that explain computers how to learn by themselves to perform tasks. There are many Machine Learning algorithms designed to achieve this purpose and probably the most popular one these days is an algorithm called Neural Networks. Neural Networks are a rough simulation of the brain. They contain thousands of simulated neurons that form millions of synaptic connections among them. Each synaptic connection is associated with a number that represents how strong that connection is.

The computer will introduce the input data (e.g. images of animals) to the input of the network and then it will fire all the corresponding neurons. That way it will produce a prediction in the output of the network. Then it will compare the predicted result with the true data (e.g. “I predicted a lion but I see that the label of this image says this is a tiger”) and it will follow a set of rules given by the Machine Learning scientists in order to modify the weight of each synaptic connection so that the error of the prediction is reduced (e.g. “If I make these synaptic connections stronger, and these other synaptic connections weaker, then the next time I see this image I will correctly predict that this image represents a tiger”). The computer will repeat this process going through the whole set of training data multiple times until that the error cannot be reduced anymore. Note that we do not have provide the computer with an update rule for each individual synaptic connection, but we provide the computer with some general rules that it applies millions of times (for those of you that still remember some high school calculus, what we do is to compute the derivative of the error with respect each synaptic connection and then move the value of each connection towards the direction than reduces the prediction error).