The learning machines

6 min readSep 1, 2019

There are two recurrent themes when speaking about artificial intelligence. One that predicts a world conquered by malicious robot and one that predicts a utopic world where robots will serve us and will do all the work for us. Which one will come true?

Well, probably nobody knows it, neither do I. The intent behind this article isn’t to give you some fictional world of the future. The purpose is to illustrate the learning principle behind most of the current artificial intelligence systems and what learning mechanism will dominate in the future.

The disruption of artificial intelligence has just started.

Learning from supervision

The most used yet practical way to make a computer learn is to give a huge amount of data with the corresponding correct response. Think about it like a long sequence of pre-defined questions and answers. The job of the computer is that of learning from this sequence such that given a new question, the computer is able to predict the correct answer.

The most known example, that I’m sure you have already heard, is the classification of an image as a cat or a dog. How can a computer learn to discriminate between these two animals? By just giving to it thousands of images with the corresponding labels (i.e. cat or dog).

*Image Reference:* *https://goo.gl/BPtB4f*

Then, through algorithms known as supervised learning, the machine improves itself and understands the difference. Then, given a new picture never seen before, the machine will be able to recognize if the subject is a cat or a dog.

With this paradigm, you can do much more than classify images. For instance, you can detect objects, recognize faces, and detect facial expressions. Furthermore, the same algorithms can also be used to work with other types of data like text and speech.

On these kinds of tasks, this learning technique performs very well. This may sound quite compelling, however, from a feasibility and learning perspective, there are two big limits. Can you see them?

To make the machine learn, humans are providing all the data. But that could become a nightmare if a human has to annotate (i.e. give the answer) each example manually. We cannot manually label thousands or millions of examples for each new task that a machine has to learn. Furthermore, in a certain way, we already showing them how to do the job. What about those tasks that we don’t know how to solve?

Wouldn’t you prefer to instruct a robot on what it should do and let it figure out how to do it?

How humans learn

Humans learn mostly from experience. We learn a job, a sport, and everything else, by trials and error. We repeat the task more and more times in order to master it. We gather feedback from the environment and decipher what worked and what didn’t. We then use this information to improve our beliefs and behavior to perform better the next time. Each time there is also a tradeoff between the exploitation of actions that we know works best and the exploration of new actions that may lead to further improvements.

Do you see the difference between our learning process and that of machines? We learn from limited feedback and we are able to experiment by ourself with a complete degree of freedom. We are constrained only by ourselves and by the laws of physics. If we think that by going left we’ll accomplish our goal sooner, we go left. Machines instead, has a closed-loop system constrained by a bunch of data with the corresponding correct answers assigned by humans. They can learn only by those.

How can machines learn to accomplish a goal with only small human supervision? How can machines figure out by themselves how to do a task? Going back to our cat/dog classification, can machines move freely around the world to learn the differences between a dog and a cat?

Machines that are free to interact

Yes, machines can learn as humans do. At the foundation of human-like learning, there is the freedom of movement in the environment. This can be either the real world or a simulator.

The interactions with the environment is a key learning component.

The paradigm is quite simple. You give the machines the freedom to experiment with the environment and at least two feedbacks: a positive one whenever they accomplish a goal, and a negative (or neutral) in all the other cases. That’s everything they need from the world. Then, using reinforcement learning algorithms, as the time progress and the machine accumulate experience from the environment, they figure out by themselves the best actions to take in each and every situation to accomplish the goal.

For example for training a robotic arm to pick up the dices, you just give to it a positive reward when it’s able to lift up the dice. In all the other cases, the reward will be absent. Then the reinforcement learning algorithm will do the magic and after a few hours/days, the robot will pick up all the dices you want.

By reducing human feedback, machines are free to learn their own strategies and achieve superhuman capabilities.

The training of the robot with reinforcement learning can be done either in the real world or in a simulation. For example, the same learning technique has been applied successfully to play videogames like Dota 2 and StartCraft II. In both games, the AI system has been able to beat for the first time ever professional world players.

The machines learned the games by playing a lot of games against itself and shown superhuman long-term planning, strategy, and perception capabilities.

*Visualization of DeepMind’s AI system called AlphaStar while playing StartCraft II against Mana, a professional player. Image Reference:* https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii

The near future

No need to say that there are many open problems around this learning technique. However, the method is already here and the results are astonishing. It’s only a matter of time and technological development, and we’ll see soon more and more machines using this approach.

Once some efficiency related issued will be solved, you’ll be able to teach a robot by yourself to do things that you never thought were possible. And as you’ll give feedback to the robot you’ll see every day the robot improving by itself.

Do you feel inspired by this future?

Additional Resources

Read the full article on how DeepMind’s AI system beat the top professional player: AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
If you want to learn more about reinforcement learning, read: Reinforcement-Learning in 60 Days