Deep Blue and AlphaZero: Comparing Giants of Artificial Intelligence

Hein de Haan
Jun 17, 2019 · 6 min read
Photo by Piotr Makowski on Unsplash

In 1997, Deep Blue, a Chess computer developed by IBM as the next stage of Carnegie Mellon University’s Deep Thought project, defeated then-reigning World Chess champion Garry Kasparov with a score of 3.5–2.5. With this feat, Deep Blue became the first ever computer to defeat the reigning World Chess champion under standard time controls.

Artificial Intelligence company DeepMind’s AlphaZero is the successor to AlphaGo Zero, which itself succeeds AlphaGo Master, the successor of AlphaGo. In March 2016, AlphaGo beat Lee Sedol in a five-game match of the board game Go, becoming the first Go computer program defeating a human 9-dan professional without handicaps.

A year later, in May of 2017, AlphaGo Master beat Ke Jie, the then-number 1 player in the World. Currently, AlphaZero is considered the strongest Go player in existence (human or artificial), beating AlphaGo Zero, which beat the Lee Sedol-defeating AlphaGo with a score of 100–0, with a score of 60–40.

Comparing the Giants

It may seem unfair to compare Deep Blue and AlphaZero because of the difference in time periods in which they were invented. However, the comparison I will do in this post is not based on performance, but more on the technologies being used to achieve the performances and their advantages and disadvantages. Performance-wise, AlphaZero wins outright, since it is both a better Chess player AND a better Go player than Deep Blue (yes, Deep Blue does not even play Go at all). Indeed, this is not a big surprise since AlphaZero was developed more than 25 years later than Deep Blue.

After a little bit of diving into the different bits of intelligence of Deep Blue and AlphaZero, I’ll compare the systems on both transparency and applicability.

The Intelligence of Deep Blue

Search Procedure

Deep Blue’s Chess intelligence is one of the great examples of GOFAI (Good Old-Fashioned Artificial Intelligence) it used a human-designed search procedure to find good Chess moves, and everything it did it was explicitly programmed to do. Deep Blue searched moves in a minimax way, meaning it based the validity of a move on the best counter-move the opponent can do. This ‘best’ counter-move was found in the same way: by considering what counter-move the opponent’s opponent (i.e., Deep Blue) could do, which again was decided by which counter-moves the opponent can do, etc. This search procedure must stop at some point: it does so when a terminal position (win/lose/draw) is reached, or when a certain search dept (e.g. 10 moves ahead) is reached. In case of a terminal position, the value of the position is easy to determine: 1 point for a win, -1 for a lose and 0 for a draw (for example). When the maximum search dept is reached and the position is not terminal, a heuristic value has to be calculated, based on the Chess pieces still available to each player and their (relative) positions.

Apart from the minimax search procedure, Deep Blue also used a database with known Chess games.

Was Deep Blue Really Intelligent?

While I have no doubt whatsoever that Deep Blue displayed true intelligence, others (including one of Deep Blue’s developers) have disagreed. One of the complaints one often hears is that Deep Blue uses brute-force: it ‘just’ searches through a lot of positions, which is (supposedly) unlike human Chess intelligence. What’s important in deciding whether Deep Blue was truly intelligent is having a clear definition of intelligence. I like Legg and Hutter’s informal definition:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

Although this definition applies to universal intelligence (in more than one environment), it works for one environment (in this case Chess) as well. If the goal was to win at Chess against a World champion (and it certainly was), Deep Blue was definitely intelligent: it had a great ability to achieve its goal.

The Intelligence of AlphaZero

While the extreme level of Go-intelligence of AlphaZero is truly amazing, what might be more interesting is the fact that it can do more than Go. You see, AlphaZero is, unlike Deep Blue, a learning program, and its design allows it to learn the board games Chess and Shogi as well. As you might have guessed, it plays both these games at the highest current level, too.

How does AlphaZero accomplish its amazing feats? Well, it uses a learning technique called Reinforcement Learning. Basically, in the case of Go, it starts off as a very bad Go player. It starts to play games against itself. At each step, it chooses its next move based on the value of each available next game position. This value is given to it by a Deep Neural Network (Deep Learning). For the purposes of this post, the important thing about Neural Networks is that they can be trained: they change whatever value they give to a position based on feedback. So, at the start, this Neural Network is randomly initialized and is probably terrible at giving values to Go positions. As a result, AlphaZero plays a terrible game of Go against itself. It does, however, see which move leads to a win or a lose at the end of the game! Based on this information, the Neural Network gets feedback: a winning position is good, and the position before the winning position was (probably) good too (since it led to the winning position), although of course not exactly as good as the winning position. The position before that position was probably also quite good, etc. The same can be done with losing positions, which are, of course, bad. By playing a huge number of games, AlphaZero trains itself to become better and better, eventually reaching alien levels of Go intelligence.

Transparency

In my opinion, the great advantage of the intelligence of Deep Blue is its great transparency. The way Deep Blue decided its next move was completely understood, since it was explicitly programmed into it. As you might have guessed, AlphaZero has a way lower transparency than Deep Blue: it taught itself to play Go, and its intuition (yes, I call it intuition) is not completely understood. In the case of Deep Blue, for each move it played, it could be worked out exactly why it played that move; in the case of AlphaZero, this is extremely hard if not impossible. It’s a bit like humans having trouble explaining exactly why they play a certain move: it’s based on a feeling, an intuition. This intuition has a mathematical basis (both in humans and in AlphaZero), but a very complex one. As a junior AI programmer myself, I could imagine a way to modify Deep Blue to be able to respond to questions about why it did a certain move, by printing something like “To avoid a checkmate”, or “To guarantee a win 2 moves later”. I can’t even imagine how to start doing this for AlphaZero.

Now, one might argue that it’s all just games, and the transparency of the intelligence doesn’t matter much. I would strongly disagree. The technology of AlphaZero was developed not just for board games and has (almost) universal application. Since it is not domain-specific and can teach itself, it can be used to perform real-world tasks like detecting cancer in medical images or driving a car in real traffic. In these areas, where human lives may be in danger, transparency of the intelligence is of vital importance. If something goes wrong, it is important to know whether it was the artificial intelligence’s fault and if so, how the same mistake can be avoided in the future.

Applicability

Where Deep Blue’s defeat against Garry Kasparov was mostly ‘just’ that, AlphaZero is about more than ‘just’ its amazing Go, Chess and Shogi performance. As said before, AlphaZero can be used to perform real World tasks, where Deep Blue was designed very specifically for Chess and Chess alone. It is true that Deep Blue inspired computer scientists in making computers handle complex calculations, but the system itself could do nothing but Chess.

Conclusion

It seems to be an unfortunate fact of Artificial Intelligence that Deep Learning (for example AlphaZero) produces such extremely good results compared to GOFAI (for example Deep Blue), which is much more understandable.

Don’t get me wrong: I am extremely excited about AlphaZero and Deep Learning in general! However, I, like more people, am also a bit worried that these techniques will someday lead to unforeseen accidents.

Data Driven Investor

from confusion to clarity, not insanity

Hein de Haan

Written by

AI expert, Futurist and Space Enthusiast

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade