Learning to Play

Published in

Sears Israel

8 min readSep 17, 2016

Ever since I took my first course in Artificial Intelligence, I was fascinated by the idea of AI in its classical meaning — teaching machines to perform tasks deemed by us humans as requiring intelligence to do.

Last week I gave a talk at my company on some of the intriguing instances of one of these tasks — learning to play (and win!) games. I often found the human stories behind the scenes even more fascinating than the algorithms themselves, and in this talk that was my focus. It was really fun both to assemble as well as deliver, so I wanted to capture these stories in this blog post, to accompany the presentation slides.

So let’s get started!

A HUMBLE START

Game playing is a fantastic AI task, one that researchers were always excited about. Just like a toddler being taught to swing a baseball bat by an excited parent, the algorithm gets clear rules, a measurable goal, and training input. But above all, testing the result involves the fun act of playing against the opponent that you have created yourself, just like a proud parent. What a great way to do AI research!

As we go way back in the AI time machine, the first known implementation of an AI game was as back as 1950. Josef Kates was a young Jewish Austrian engineer, whose family fled the Nazis rise to power and ended up in Canada. Kates worked on radar and vacuum tubes design at a company named “Rogers Majestic”, and later developed his own patented tube which he called the “Additron”. While waiting for the patent to be registered, he wanted to demonstrate the power of his invention in a local technology fair, and built a machine that could play Tic-Tac-Toe, which he named “Bertie the Brain”.

Comedian Danny Kay pleased after “beating” Bertie the Brain during the fair

The machine, named “Bertie the Brain”, was a huge success at the fair. Kates made sure to adjust its level of difficulty to allow players to occasionally beat it, and visitors lined up to play. Nevertheless, at the end of the fair it was dismantled and forgotten. Unfortunately for Kates, the Additron took a very long time to go through patenting, and by the time it was approved technology has already moved on towards transistors.

The algorithms pioneered and used in those early days were based on the Minimax method — constructing a tree of all possible moves by the player and its opponent, and evaluating the proximity to a winning position. In each move, the algorithm would assume best play with the computer playing the move with MAXimal value and the opponent playing its own maximum, which is the computer’s MINimal value, thus the algorithm could calculate into the future as much as time permitted.

With only 765 unique board positions in Tic-Tac-Toe, the game was small enough that all positions and moves could be calculated in advance, making Bertie unbeatable. AI researchers call this situation a “Solved” game. In fact, perfect game play will always end in a draw, and if you watched the movie “War Games” with Matthew Broderick, you’ll recall how this fact saved the world from nuclear annihilation…

ADVANCE TO WORLD-CLASS WINS

So if Tic-Tac-Toe is too simple, how about a more complex game, such as Checkers?

Checkers has, well, slightly more board positions; at 5 x 10e+20 board positions it was a much more challenging AI task. The most well-known checkers program, even if not the first, was the one written by Arthur Samuel at IBM. Samuel’s checkers was considered a real classic, and for several decades it was considered the best that can be achieved. It still used Minimax, but it expanded its repository of boards positions from actual games played, often against itself, thus becoming a true learning algorithm. However, it never got to a level of beating master human players.

In 1989, a group of researchers led by Jonathan Schaeffer from the University of Alberta set out to use advances in computing and break that glass ceiling, with a new program called Chinook. I had the privilege of attending a fascinating talk by Schaeffer in the Technion 10 years ago, and the blog post I wrote as a result summarizes the full story. That story has fascinating twists and touching human tributes in it, but it ends with machines being the clear winners, and AI researchers declaring the game of checkers as solved as well.

The obvious next challenge in our journey would be what’s considered the ultimate game of intelligence — Chess. Using the same board as Checkers, but more complex moves, chess has approximately 10e+120 board positions — that’s about the number of checkers positions, squared. A well-known chess-playing machine was “The Turk”, designed and constructed in Austria by Wolfgang von Kempelen, as early as 1770. The Turk wa a wonder at that time, beating experienced chess players and at one time, Napoleon Bonaparte. It was a hoax, of course, cleverly hiding a human sitting in it, but the huge interest it created was a symbol of the great intelligence attributed to playing the game.

However, the huge search space in which Minimax had to be applied for chess made early programs extremely weak against humans. Even with the introduction of minimax tree pruning methods such as Alpha-Beta pruning, it seemed like no algorithmic tuning would bring a breakthrough. As the decades passed, more powerful computers enabled faster computations and larger space to hold billions of possible board positions. This culminated in the famous duel in 1996 between IBM’s Deep Blue chess-playing computer, already capable of evaluating 200 million positions per second, and the world champion at the time, Gary Kasparov. Despite losing two games to Deep Blue, Kasparov won the tournament easily, 4–2. IBM went on to further improve Deep Blue, and invited Kasparov to a re-match in 1997. Kasparov won the first game easily, and was so self-confident as a result, that he lost the next game, a loss he blamed on cheating by IBM. The match ended 3.5–2.5 for Deep Blue, a sensational first win for a machine over a presiding world champion.

FROM BRUTE FORCE TO TRUE LEARNING

The shared practice that connected all the work we saw so far, from Bertie the Brain to Deep Blue, was to feed very large amount of knowledge to the software, so that it can out-do the human player by sheer computing power and board positions stored in its huge memory. This enabled algorithms such as Minimax process enormous numbers of positions, apply the human-defined heuristics to them, and find the winning moves.

Let’s recall the toddler from the start of our journey. Is this how humans learn? would we truly consider this artificial intelligence?

If we want to emulate true intelligence, what we’d really like to build are algorithms that learn by themselves. They will watch examples and learn from them; they will build their own heuristics; they will infer the domain knowledge rather than have it stuffed into them.

In 2014, a small London start-up named DeepMind Technologies, founded less than 3 years earlier, was acquired by Google for a staggering sum of $600M, before it released even one product to the market. In fact, reporters struggled to explain what DeepMind was doing at all.

The hints to what attracted Google to DeepMind lie in a paper its team published in December 2013. The paper, presented in NIPS 2013, was titled “Playing Atari with Deep Reinforcement Learning”. It was about playing games, but unlike ever before; this was about a generic system, learning to play games without being given any knowledge, nothing but a screen and the score-keeping part in it. You could equate it to a human who never played Pac-Man, taking the controls and just hitting them in all directions, watching the score, and gradually figuring out how to play it like a pro; and then, doing the same for many other games, all using the same method. Sounds human? this was the technology Google were after.

Watching DeepMind play Atari Breakout (see in this video) is like magic. The algorithm starts out moving at random, barely hitting the ball once every many misses. After an hour of training, it starts playing at an impressive pro level. Then it even learns the classic trick that any Breakout player eventually mastered — tunneling the ball to the top so that it hits bricks off with little effort. The beauty of it was that the exact same system mastered several other games with no custom optimizations — only the screen raw input, and an indication of where the score is; nothing else. This was no Minimax running, no feeding of grandmaster moves books, or human-crafted heuristic functions. It was generic deep-learning neural networks, using reinforcement learning that would look at a series of moves and their score outcome, and uncover the winning patterns all by itself. Pure magic.

AI BUILDING GAMES

For the last part of the talk, I deviated to a related, though different topic. For this part, I was walking through a wonderful series of blog posts I stumbled upon called “Machine Learning is Fun!”, Adam Geitgey walks through basic concepts in Machine Learning. In part 2, he describes how Recurrent Neural Networks can be trained to learn and generate patterns. The simplest example we all know and appreciate (or sometimes maybe not…) is the predictive typing feature of mobile keyboards, where the system attempts to predict what word we are trying to type — the great cause of so many texting gaffes.

Moving to more elaborate examples, Geitgey fed an RNN implementation with a Hemingway book (“The Sun Also Rises”), and trained it recurrently on the book’s text, then having it spit out texts of its own that would match the book. It starts out with incomprehensible strings of text, but gradually take the form of words and sentences, to the point that the sentences almost make sense, and carry Hemingway’s typical curt dialog style.

Geitgey then takes this system and applies it to no other than Super Mario Maker. This is a version of Super Mario that allows players to build levels of their own. He transforms game levels into text streams and feeds these into the learning system. Also here, at first the system spits out nonsense; it gradually learns the basic rules, and eventually generates actual playable levels. I’m no expert on Super Mario so I couldn’t tell, but I showed it to my son, and he said it’s a great level and that he would be happy to play it. That’s intelligent enough for me!

(originally posted on: https://alteregozi.com/2016/09/17/learning-to-play/)

Learning to Play

A HUMBLE START

ADVANCE TO WORLD-CLASS WINS

FROM BRUTE FORCE TO TRUE LEARNING

AI BUILDING GAMES

Written by Ofer Egozi