Is AlphaZero really a scientific breakthrough in AI?
As you may probably know, DeepMind has recently published a paper on AlphaZero , a system that learns by itself and is able to master games like chess or Shogi.
Before getting into details, let me introduce myself. I am a researcher in the broad field of Artificial Intelligence (AI), specialized in Natural Language Processing. I am also a chess International Master, currently the top player in South Korea although practically inactive for the last few years due to my full-time research position. Given my background I have tried to build a reasoned opinion on the subject as constructive as I could. For obvious reasons, I have focused on chess, although some arguments are general and may be extrapolated to Shogi or Go as well. This post represents solely my view and I may have misinterpreted some particular details on which I am not an expert, for which I apologize in advance if it is the case.
Chess has arguably been the most widely studied game in the context “human vs machine” and AI in general. One of the first breakthroughs in this area was the victory of IBM Deep Blue in 1997 over the world champion at the time Garry Kasparov . At that time machines were considered inferior to humans in the game of chess, but from that point onwards, the “battle” has been clearly won by machines.
On a related note, DeepMind released a couple of years ago AlphaGo, a Go engine which was able to beat some of the best human players of Go . Note that the complexity of Go is significantly larger than in chess. This has been one of the main reasons why, even with the more advanced computation power available nowadays, Go was still a game on which humans were stronger than machines. Therefore, this may be considered a breakthrough in itself. This initially impressive result was improved with AlphaGo Zero which, as claimed by the authors, learnt to master Go entirely by self-play . And more recently AlphaZero, a similar model that trains a neural network architecture with a generic reinforcement learning algorithm which has beaten some of the best engines in Shogi and chess .
This feat has been extensively covered by mass media [5,6] and chess-specialized media [7,8], with bombastic notes about the importance of the breakthrough. However, there are reasonable doubts about the validity of the overarching claims that arise from a careful reading of AlphaZero’s paper. Some of these concerns may not be considered as important by themselves and may be explained by the authors. Nevertheless, all the concerns added together cast reasonable doubts about the current scientific validity of the main claims. In what follows I enumerate some general concerns:
- Availability/Reproducibility. None of the AlphaZero systems developed by DeepMind are accessible to the public: the code is not publicly available and there is not even a commercial version for users to test it. This is an important impediment, as from the scientific point view these approaches can be neither validated nor built upon it by other experts. This lack of transparency makes it also almost impossible for their experiments to be reproduced.
- 4-hour training. The amount of training of AlphaZero has been one of the most confusing elements as explained by general media. According to the paper, after 4 hours of training on 5000 TPUs the level of AlphaZero was already superior to the open-source chess engine Stockfish (the fully-trained AlphaZero took a few more hours to train). This means that the time spent by AlphaZero per TPU was roughly two years, a time which would be considerably higher on a normal CPU. So, even though the 4-hour figure may seem impressive (and it is indeed impressive), this is mainly due to the large capacities of computing power available nowadays with respect to some years ago, especially for a company like DeepMind investing heavily on it. For example, by 2012 all chess positions with seven pieces or less had been mathematically solved, using significantly less computing power . This improvement on computing power paves the way for the development of newer algorithms, and probably in a few years a game like chess could be almost solved by heavily relying on brute force.
- Experimental setting versus Stockfish. In order to prove the superiority of AlphaZero over previous chess engines, a 100-game match against Stockfish was played (AlphaZero beat Stockfish 64–36). The selection of Stockfish as the rival chess engine seems reasonable, being open-source and one of the strongest chess engines nowadays. Stockfish ended 3rd (behind Komodo and Houdini) in the most recent TCEC (Top Chess Engine Competition) , which is considered the world championship of chess engines. However, the experimental setting does not seem fair. The version of Stockfish used was not the last one but, more importantly, it was run in its released version on PC, while AlphaZero was ran using considerably higher processing power. For example, in the TCEC competition engines play against each other using the same processor. Additionally, the selection of the time seems odd. Each engine was given one minute per move. However, in the vast majority of human and engine competitions each player is given a fixed amount of time for the whole game, and then this time is administered individually. As Tord Romstad, one of the original developers of Stockfish, declared, this was another questionable decision in detriment of Stockfish, as “lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move” . Tord Romstad also pointed out to the fact that Stockfish “was playing with far more search threads than has ever received any significant amount of testing”. Generally, the large percentage of victories of AlphaZero against Stockfish has come as a huge surprise for some top chess players, as it challenges the common belief that chess engines had already achieved an almost unbeatable strength (e.g. Hikaru Nakamura, #9 chess player in the world, showed some scepticism about the low draw-rate in the AlphaZero-Stockfish match ).
- 10 games against Stockfish. Along with the paper only 10 sample games were shared, all of them victories of AlphaZero . These games have been praised by all the chess community in general, due to the seemingly deep understanding displayed by AlphaZero in these games: Peter Heine Nielsen , chess Grandmaster and coach of the world champion Magnus Carlsen, or Maxime Vachier Lagrave , #5 chess player in the world, are two examples of the many positive reactions about the performance of AlphaZero against Stockfish in these games. However, the decision to release only ten victories of AlphaZero raises other questions. It is customary in scientific papers to show examples on which the proposed system displays some weaknesses or may not behave as well in order to have a more global understanding and for other researchers to build upon it. Another question which does not seem clear from the paper is if the games started from a particular opening or from scratch. Given the variety of openings displayed in these ten games, it seems that some initial positions were predetermined.
- Self-play. Does AlphaZero completely learn from self-play? This seems to be true according to the details provided in the paper, but with two important nuances: the rules and the typical number of moves have to be taught to the system before starting playing with itself. The first nuance, although looking obvious, is not as trivial as it seems. A lot of work has to be dedicated to find a suitable neural network architecture on which these rules are encoded, as also explained in the AlphaZero paper. The initial architecture based on convolutional neural networks used in AlphaGo was suitable for Go, but not for other games. For instance, unlike Go, chess and shogi are asymmetric and some pieces behave differently depending on their position. In the newest AlphaZero, a more generic version of the AlphaGo algorithm was introduced, englobing games like chess and Shogi. The second nuance (i.e. the typical number moves was given to AlphaZero to “scale the exploration noise”) also requires some prior knowledge of the game. The games that exceeded a maximum number of steps were terminated with a draw outcome (this maximum number of steps is not provided) and it is not clear whether this heuristic was also used in the games against Stockfish or only during training.
- Generalization. The use of a general-purpose reinforcement learning that can succeed in many domains is one of the main claims in AlphaZero. However, following the previous point on self-play, a lot of debate has been going around with regards to the capability of AlphaGo and AlphaZero systems to generalize to other domains . It seems unrealistic to think that many situations in real-life can be simplified to a fixed predefined set of rules, as it is the case of chess, Go or Shogi. Additionally, not only these games are provided with a fixed set of rules, but also, although with different degrees of complexity, these games are finite, i.e. the number of possible configurations is bounded. This would differ with other games which are also given a fixed set of rules. For instance, in tennis the number of variables that have to be taken into account are difficult to quantify and therefore to take into account: speed and direction of wind, speed of the ball, angle of the ball and the surface, surface type, material of the racket, imperfections on the court, etc.
We should scientifically scrutinize alleged breakthroughs carefully, especially in the period of AI hype we live now. It is actually responsibility of researchers in this area to accurately describe and advertise our achievements, and try not to contribute to the growing (often self-interested) misinformation and mystification of the field. In fact, this early December in NIPS, arguably the most prestigious AI conference, some researchers showed important concerns about the lack of rigour of this scientific community in recent years .
In this case, given the relevance of the claims, I hope these concerns will be clarified and solved in order to be able to accurately judge the actual scientific contribution of this feat, a judgement that it is not possible to make right now. Probably with a better experimental design as well as an effort on reproducibility the conclusions would be a bit weaker as originally claimed. Or probably not, but it is hard to assess unless DeepMind puts some effort into this direction. I personally have a lot of hope in the potential of DeepMind in achieving relevant discoveries in AI, but I hope these achievements will be developed in a way that can be easily judged by peers and contribute to society.
— — — — — -
 Silver et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815 (2017). https://arxiv.org/pdf/1712.01815.pdf
 Silver et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354–359. https://www.gwern.net/docs/rl/2017-silver.pdf
 Link to reproduce the 10 games of AlphaZero against Stockfish: https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish/1/1/1
 Ali Rahimi compared current Machine Learning practices with “alchemy” in his talk at NIPS 2017 following the reception of his test of time award: https://www.youtube.com/watch?v=ORHFOnaEzPc