Machines That Play series has been broken into 7 parts. This is Part 3 of the series.
This series covers the history of Artificial Intelligence and games (until Deep Blue) and focuses on machines that played chess, checkers, and backgammon. The following topics are covered: how to build chess machines, Shannon’s work on chess, Turing’s work on chess, The Turk, El Ajedrecista, MANIAC, Bernstein chess program, Samuel’s checkers, Mac Hack VI, Cray Blitz, BKG, HiTech, Chinook, Deep Thought, TD-Gammon, and Deep Blue.
Part 1: Machines That Play (Overview)
Part 2: Machines That Play (Building Chess Machines) — this one
Part 4: Machines That Play (Deep Blue)
If you want a summary of the first 5 parts, focusing on the human elements, go here [link coming soon].
Part 6: Machines That Play (Checkers)
Part 7: Machines That Play (Backgammon)
It will cover a little bit of the history of computer chess, focusing on: Turk, El Ajedrecista, Shannon and Turing’s approaches to build chess programs, MANIAC, Bersnstein’s Chess program, Mac Hack VI, Cray Blitz, HiTech, ChipTest, and Deep Thought — most major attempts until Deep Blue.
Computer chess is a difficult problem. It was considered the “drosophila” of AI.
Why would anyone want to teach a machine how to follow some arbitrary man-made rules to move a bunch of wooden pieces on a checkerboard, with the sole aim of cornering one special wooden piece? It’s so human to attack and capture pawns, knights, bishops and the queen to finally corner the king into an inescapable position. Those are actions and our goals and we balance strategy and tactics to suit those. hen why teach machines to play chess?
Chess has long been regarded as the game of intellect. And many people argued that a machine that could successfully play chess would prove that thinking can be modeled/understood or that machines can be built to think. And that is exactly why people wanted to build a machine that could follow some arbitrary rules of our game and become so good at it that it could one day beat us it.
Creation of electronic computers began in the 1930s. By the late 1940s computers were used as research and military tools in US, England, Germany, and the former USSR. Early AI researchers wanted to use computers for other engineering challenges that could both attract public interest to AI and allow them to solve complex problems. Computer chess represented this exact engineering challenge.
The initial optimism led to a prediction in 1958 that machines would defeat all humans within 10 years (this won’t happen until late 1990s). In the 1960s, AI pioneers Herbert Simon and John McCarthy referred to chess as ‘the Drosophila of AI’, which meant that chess, like the common fruit fly, represented a relatively simple system that could also be used to explore larger, more complex real-world phenomena.
“Chess is mental torture.” Garry Kasparov
“Chess is the touchstone of intellect.” Johann Wolfgang von Goethe
“Hundreds of millions of people around the world play chess. It’s known as a game that requires strategy, foresight, logic — all sorts of qualities that make up human intelligence. So it makes sense to use chess as a measuring stick for the development of artificial intelligence.” Murray Campbell (link)
Computer chess was the perfect test-bed for AI research. Let’s start.
The Turk (1770). Game: Chess
Let’s start in 1769.
François Pelletier was a reputed French illusionist, famous for acts involving magnetism. He was invited to perform at the court of Maria Theresa of Austria at Schönbrunn Palace. Hungarian inventor Wolfgang von Kempelen, who was present in the court, saw the magic act and decided to create a much more compelling spectacle.
And he delivered. In the spring of 1770, Wolfgang von Kempelen created a sensation; he presented the world’s first ever chess-playing automaton, which he called the Automaton Chess-player, known in modern times as The Turk.
Automaton Chess Player
The Turk was a man-made machine that could play chess against any human opponent as well as perform the knight’s tour, a puzzle that requires the player to move a knight to occupy every square of a chessboard exactly once. The Turk was capable of completing the tour without any difficulty.
The Turk was mechanical figure of a bearded man dressed in Turkish clothing. He was seated above a cabinet with a chessboard on top. Von Kempelen would begin his demonstration of the Turk by opening the doors and drawers of the cabinet and and showing the inside, which was made of cogs, gears, and all things mechanical. Von Kempelen would then close the cabinet and invite a volunteer to play against the Turk.
The Turk became a hit and Von Kempelen was invited to tour across Europe. The Turk began its European tour in 1783 and it operated for nearly 84 years (when it was destroyed in a fire). During the tour, it interacted with a range of historical figures, including Benjamin Franklin, Catherine the Great, Napoleon Bonaparte, Charles Babbage, and Edgar Allan Poe.
Many people insisted it was a trick, but no one figured out the exact trick. Edgar Allan Poe claimed that a human mind was at work because if the machine was truly a machine, then it would always win — it would play chess perfectly because machines would not make any computational errors. And since the Turk did not play perfect chess it was not a machine.
Edgar Allan Poe claimed that a human mind was at work because a real machine would always win — it would play chess perfectly. It would never lose.
So how did the Turk do it?
It didn’t. As it turned out, the Turk was a hoax. Poe was right, but his reasons were wrong. We would later see machines are flawed and they do not play perfect chess.
The secret of this early automaton is “artificial artificial intelligence”: a human is doing the automaton’s job. Inside the machine hid a man who was small in size and who could play chess well. He hid in a small compartment that could slide left and right. When von Kempelen opened the left cabinet, the man would slide to the right. When he opened the right cabinet, the man would slide left.
In our modern era it took a supercomputer to beat the world chess champion, so, to us, it seems fairly obvious that the Turk must have been a hoax. But at that time no one knew what it would take for a machine to win. It makes sense that the Turk became a spectacular attraction. And it has since become one of the most famous automatons in history. It has thrilled us and it has terrified us — and, in some sense, the fraud of it all has left us in awe.
A more honest attempt
El Ajedrecista (The Chessplayer) (1914). Game: Chess.
In early 1910s, a more honest attempt to build a chess-playing machine was made in by Torres y Quevedo. Torres y Quevedo was a Spanish civil engineer and mathematician. He built an automaton called El Ajedrecista (The Chessplayer), which made its debut, at the University of Paris in 1914. It is considered to be one the world’s first computer games.
El Ajedrecista: A machine that plays chess (not the full game)
El Ajedrecista created great excitement when it was first publicly demonstrated in Paris in 1914. It was later demonstrated at the Congress on Cybernetics in France of 1951 (see video clip and related original blog).
The machine could not play a full game of chess, but it played an endgame with three pieces, a king and a rook controlled by the machine against a single king controlled by a human player. In this case, checkmate for the human is inevitable and the machine was able to beat its opponent every time.
The machine, however, did not deliver checkmate in the minimum number of moves (i.e. it did not perform optimization), nor necessarily within 50 moves (see fifty-move rule). Torres’ idea was quite sophisticated for its time. The machine could detect and signal an illegal move. It also signaled game states (both of “check” and “checkmate”) to the player.
About Torres: “He would substitute machinery for the human mind” — Scientific American (1915)
Torres’ attitude of the first chess automaton was summarized in the Scientific American account (1915):
“There is no claim that [the chess player] will think or accomplish things where thought is necessary, but its inventor claims that the limits within which thought is really necessary need to be better defined, and that an automaton can do many things that are popularly classed with thought. It will do certain things which depend upon certain conditions, and these according to arbitrary rules selected in advance.”
Perfect games (1928–1944)
John von Neumann founded the field of game theory. In 1928 he proved the minimax theorem. This theorem states that in zero-sum games* (i.e. if one player wins, then the other player loses) with perfect information (i.e. in which players know, at any time, all moves that have taken place so far), there is a pair of strategies for both players that allows each to minimize his maximum losses, hence the name minimax. This means one assumes one’s opponent will move in such a way as to maximize his gain, one then makes one’s own move in such a way as to minimize losses caused by the opponent.
Minimax is a recursive algorithm and it chooses an optimal move based on the assumption that both players play optimally. Simply put, it’s trying to model what we do when we play: we think “If I make this move, then my opponent can only make only these moves, then I can make this move,…”
John von Neumann was quoted as saying “As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved”.
*Chess, poker, or any other two-player game with one winner and one loser, can be considered a zero-sum game.
Improving the minimax procedure
Von Neumann improved Zermelo’s minimax theorem to include games involving imperfect information and games with more than two players. In 1944, John Von Neumann and Oskar Morgenstern published Theory of Games and Economic Behavior. This is the groundbreaking book that created the research field of game theory. Originally, game theory addressed zero-sum games, but since then it has been applied to a wide range of behavioral relations, and has become a framework for modeling scenarios in which conflicts of interest exist among the participants.
Some side notes:
- The minimax algorithm is traced to a 1913 paper by Ernst Zermelo, the father of modern set theory. The paper contained several errors and did not describe minimax correctly. He did, however, propose (but did not prove) what became known as Zermelo’s Theorem: in any finite deterministic (i.e. in which chance does not affect the decision making process) perfect information two-player game (such as chess) there are three possibilities: either the first-player can force a win, or the second-player can force a win, or both players can force a draw. When applied to chess, Zermelo’s Theorem states “either White can force a win, or Black can force a win, or both sides can force at least a draw”. We don’t (yet) know which it is for chess. (Checkers, on the other hand, has been solved: either player can force a draw.)
- What does it mean to say a game is perfect? Each Player has complete knowledge of the game state. For two players games, they take alternate turns. Examples: Chess, Checkers, Connect-Four, Go, Othello.
- What does it mean to say a game is imperfect? Some of the game state is hidden. Examples: Poker, Bridge.
- What does it mean to say a game is not deterministic? Game involves an element of chance. Example: Backgammon (involves dice rolls).
How future machines would play chess (1950)
Starting in mid 1940s, scientists from different fields (mathematics, psychology, engineering, etc.) had started discussing the possibility of creating a machine that could think, a machine that could compete with or even surpass humans in pattern recognition, computation, problem solving and even language. And a games, specifically chess, was a good starting point.
Chess is rich and complex with massive search space (more on this later). Chess mastery was considered to be an indicator of more general human intelligence. So it turned to be a useful testing ground.
In 1950 there were no programs that played chess. It was then Claude Shannon wrote the very first article ever published on programming a computer to play chess. He published a paper in Philosophical Magazine entitled Programming a computer to play chess. He starts saying,“This paper is concerned with the problem of constructing a computing routine or “program” for a modern general purpose computer which will enable it to play chess.” So why chess? Shannon believed that chess was the ideal experimental project because “The chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require “thinking” for skillful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of “thinking”; (4) the discrete structure of chess fits well into the digital nature of modern computers.”
Shannon identified chess as an ideal problem for machines because it could be represented in a machine, it had a very clear goal (deliver checkmate), it involved no randomness, both players had access to all the information, it had relatively simple rules, yet mind-bending complexity — these rules could produce a massively large number of states (~10¹²⁰ possible chess games). This directly influenced and inspired all the chess programs of the following decades.
[Note: The principles of the modern computer were proposed by Alan Turing in his seminal 1936 paper, On Computable Numbers. The world’s first stored-program computer, Manchester Baby, was built in 1948. As soon as it demonstrated the feasibility of its design, Ferranti Mark 1, the world’s first commercially available general-purpose computer was built in 1951.]
In his paper, Shannon discussed many of the basic problems involved with programming a computer to play chess. He specified a framework consisting of three parts: a symbolic board description, a successor function that defines the set of legal moves from any position, and a position evaluation that gives an outcome for the game. This framework is found in nearly all adversarial computer games today (see adversarial search).
Shannon showed that, in theory, it would be possible to play a perfect game of chess, or to construct a machine to do so. For example the computer could try every possible move for white, followed by all of black’s response moves, and then all of white’s responses to those moves, and so on.
Chess is a finite game, it must have a result (win, draw, or loss), and it has a finite number of positions (and each position allows only a finite number of moves). Therefore, all possible combinations of moves and responses could be drawn out as a decision tree up front. And by working backward, the optimal move can be determined. So far so good.
Oh no — insanely immense search trees
The problem is, however, that even though it is possible to play a perfect game in theory, it is not possible in practice.
He said, “In typical chess positions there will be of the order of 30 legal moves. The number holds fairly constant until the game is nearly finished…a move for White and then one for Black gives about 103 possibilities. A typical game lasts about 40 moves to resignation of one party. This is conservative for our calculation since the machine would calculate out to checkmate, not resignation. However, even at this figure there will be 10¹²⁰ variations (known as Shannon’s Number) to be calculated from the initial position. A machine operating at the rate of one variation per micro-second would require over 10⁹⁰ years to calculate the first move!”
Perspective: What does this mean? That there are an estimated 10¹²⁰ different chess games. In other words, there are approximately 10¹²⁰ different paths through the tree. This is a 1 followed by 120 zeroes, a very very very big number. How big? It is estimated that there are between 10⁷⁸ to 10⁸² atoms in the known, observable universe. Let’s assume there are 10⁸⁰ atoms. Then there are more possible chess games than the number of observable atoms in the universe. But not all chess games are sensible (or legal) — the number of legal chess games closer to around 10⁴⁰ games. Even if these numbers are crude estimates, it still serves to show how complex chess an be. Another way to see this is the following analysis:
Wait…then what next?
Adriaan de Groot was a chess master and psychologist, who studied the thought processes of human chess players. He found that they all players used what we call the “lookahead search heuristic”. In 1946 he wrote his thesis Het denken van den schaker, which was translated into English in 1965 as Thought and Choice in Chess.
De Groot’s findings had a large influence on Shannon’s work. Shannon thought of the lookahead method as a practical way for machines to tackle complex problems that require “general principles, something of the nature of judgement, and considerable trial and error, rather than a strict, unalterable computing process”. Shannon then discussed, in detail, how the lookahead method could be applied by a computer to play chess.
Search innovation: minimax search (Shannon (1950) and Turing (1951))
How to search a game tree was independently invented by Shannon (1950) and Turing (1951).
Shannon’s paper forth the idea of an evaluation function, which is a function that ranks nodes in the search tree according to some criteria, i.e. evaluates the efficacy of a particular move.
Now, given an evaluation, a player/computer has to actually choose which move to make. An obvious move is looking ahead one step and simply choosing the move which has the highest evaluation score. But game theory said the player could be smarter, i.e. make a better choice, and take into account the moves the opponent could take once he’s made a move. This is the minimax algorithm. And this minimax algorithm uses the evaluation function and gives the efficacy of future moves available.
Any game can be thought of as a tree of possible future game states. In this tree representation, the current state of the game is the root of the tree, i.e. drawn at the top. This node will usually have several children nodes, together representing the set of all possible moves that a player could make.
Let’s consider games with two players zero-sum game. The minimax algorithm, in this case, is a specialized search algorithm that explores the entire game tree using a depth-first search. It works as follows: At each node in the tree where Player A has to move, A would play the move that maximizes A’s score, i.e. A will assign the maximum score amongst the children to the node. Then Player B will minimize the payoff to A. The maximum and minimum scores are taken at alternating levels of the tree, since A and B alternate turns. The minimax algorithm computes the minimax decision for the nodes of the game tree then backs up through the tree to give the final value to the current state.
We will later see that it is not possible to expand a game to end-game status so the programmer will have to choose a ply-depth that is achievable with reasonable time and resources (this is why computational power and speed will matter). The absolute “win-lose” values become heuristic scores and these heuristics will be designed or devised according to (knowledge of) the game. [Side note: In two-player sequential games, a ply refers to one turn taken by one of the players.]
Below are two examples of minimax algorithm.
Shannon’s work provided a framework for all future research in computer chess playing.
To imitate or not to imitate human thought
Shannon said because chess was considered to require “thinking”, solving this problem would force us (humanity) “either to admit the possibility of a mechanized thinking or to further restrict our concept of “thinking”. He then described how a machine that could play a “tolerably good” game of chess would need to have a search function that would identify possible moves at any given point and an evaluation function that would rank those move according to how they could influence the game.
Shannon proposed two very different approaches to do this: A) reduce the total number of moves (i.e. limit the depth of the tree) and do a brute-force search of all possible variations of that tree (known as type A strategy), or B) only search through “important branches” by using human-like heuristics to trim the decision tree (i.e. by prioritizing certain branches over others) until it is clear which branch has the advantage (known as type B strategy).
Type A: Brute force strategy
In this strategy, the machine would do a brute-force search of all possible variations of the game tree, up to a given depth. At the time, it seemed impossible that a machine could be fast enough to look ahead more than a few moves. Shannon said that a machine that implemented this strategy would, at best, be “both slow and a weak player”.
Type A strategy seemed like a computer chess dead end in the early days of AI and chess.
Type B: Intelligence strategy
In this strategy, the machine would use “intelligence” instead of just raw computing power. It would only search through “important branches” and prune the decision tree until it became clear which branch had the advantage. It would prioritize certain branches over others and thus demonstrate something similar to human intuition.
Type B strategy seemed far more feasible and the earliest programs used this strategy
This distinction would turn out to be a deep debate in the field:
Is it necessary for a machine to simulate human intelligence (i.e. think like humans) or is it sufficient if a machine can match our intelligent behavior (i.e. match our functionality)?
“It is not being suggested that we should design the strategy in our own image. Rather it should be matched to the capacities and weakness of the computer. The computer is strong in speed and accuracy and weak in analytical abilities and recognition. Hence, it should make more use of brutal calculation than humans, but with possible variations increasing by a factor of 1⁰³ every move, a little selection goes a long way forward improving blind trial and error.” — Shannon
Unlike many researchers of his time, Shannon didn’t think computer chess should be modeled on human thought — humans and machines had different strengths and limitations.
Machines would calculate really fast, they won’t make errors (unless they had programming errors), they won’t become lazy in their analysis, they won’t get emotional — they won’t have to worry about overconfidence or defeatism.
These would be balanced against strengths of the human mind: flexibility, imagination, inductive and learning capacities of the human mind.
Human vs. machine seemed like a fair fight.
A twist in the story
In 1997, a computer would beat one of the greatest players of all time: Garry Kasparov. And the twist is that it would mostly employ Type A strategy. It would beat Kasparov’s insight and brilliance by sheer calculation (well, and some other things. See Machines That Play (Deep Blue)).
Most chess programs pursued Type B strategy until mid 1970s. Around that time brute-force programs began to consistently win over programs created using human “knowledge and intuition”.
Type B strategy is the strategy that would hit a dead end.
“I see no limit to the capabilities of machines. As microchips get smaller and faster, I can see them getting better than we are. I can visualize a time in the future when we will be to robots as dogs are to humans.” — Shannon (Wikiquote (See Omni Magazine interview))
Side note — A bit of history: In between 1950, when Shannon’s paper was first published, and 1966 only three chess programs were developed. By 1970 six programs (none from the initial three) participated in the first US Computer Chess Championship. The first World Championship in 1974 had 13 participants.
Turochamp: The first program that played chess
Turochamp (1948–1953). Game: Chess
The program (and the human) that played chess
In 1953, Alan Turing published an article on his chess program (Digital Computers Applied to Games) in the book Faster than Thought by B. Bowden. Shannon had not spoken about any particular program in his paper. It was Turing who wrote the first chess program. And he wrote it before computers even existed! He knew computers were coming and once they were powerful enough, they would be able to play chess.
In 1948, Turing and David Champernowne, began writing the first chess program, i.e. a set of instructions that would enable a future computer to play chess. By 1950, the program was completed, later called Turochamp. Since there were no computers at the time, Turing simulated the program by hand, using paper and pencil. Calculating each move the program would make by hand took Turing about half an hour per move: Turing would go through the instructions in his program and carry out the moves on a chessboard.
In 1952, he tried to implement it on a Ferranti Mark 1, but the machine lacked sufficient power to execute the program. Turing never got to see his program executed by a computer. He died in 1954 — at the age of 41. [Side note: The term “Artificial Intelligence wasn’t coined until 1956, two years after Turing’s untimely death.]
How did Turochamp perform?
Turing’s program was crude and it was only able to “think” two moves in advance. When asked how many moves ahead he can think, Garry Kasparov had said, “Normally, I would calculate three to five moves…You don’t need more…, but I can go much deeper if it is required…[as many as 12 or 14 moves].”
Turing’s program won a game against Champernowne’s wife, who was a beginner at chess. Then he played another game against a colleague, Alick Glennie, which he lost. This second game was recorded (watch videos here or here).
Jack Copeland in The Essential Turing said, “Turing says that the system of rules set out in ‘Chess’ is based on an ‘introspective analysis’ of his own thought process when playing (but with ‘considerable simplifications’). His system anticipates much that has become standard in chess programming: the use of heuristics to guide the search through the tree of possible moves and counter-moves; the use of evaluation rules which assign numerical values, indicative of strength or weakness, to board configurations; the minimax strategy; and variable look-ahead whereby, instead of the consequences of every possible move being followed equally far, the ‘more profitable moves are considered in greater detail than the less’. Turing also realized the necessity of using ‘an entirely different system for the end-game…
The learning procedure that Turing proposed in ‘Chess’ involves the machine trying out variations in its method of play — e.g. varying the numerical values that are assigned to the various pieces. The machine adopts any variation that leads to more satisfactory results. This procedure is an early example of a genetic algorithm.”
Search innovation: minimax search (Shannon (1950) and Turing (1951))
How to search a game tree was independently invented by Shannon (1950) and Turing (1951).
As we’ve seen already, heuristics are necessary prune the search tree of a given game. Turing thought about two heuristics, minimax and best-first. The minimax heuristic is discussed above. The best-first heuristic involves ranking the moves available based on some kind of specified rule-of-thumb system and then exploring the options of the highest-scoring move first. The A* search algorithm is an example of best-first search, as is B* (more on these in later blogs). Turochamp seems to be the first heuristic chess program.
Champernowne’s said (on Turochamp from The Essential Turing), “Our general conclusion was that a computer should be fairly easy to programme to play a game of chess against a beginner and stand a fair chance of winning or least reaching a winning position.”
In 2012, Garry Kasparov played against Turochamp and defeated it in just 16 moves. Kasparov said (video), “I suppose you might call it primitive, but I would compare it to an early car — you might laugh at them but it is still an incredible achievement…
…[Turing] wrote algorithms without having a computer — many young scientists would never believe that was possible. It was an outstanding accomplishment.”
Profile: Alan Turing
“It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers… They would be able to converse with each other to sharpen their wits. At some stage therefore, we should have to expect the machines to take control.” — Alan Turing
Alan Turing was born on 23 June 1912 (and died on 7 June 1954). He was an English computer scientist, mathematician, logician, philosopher, and theoretical biologist — a rockstar.
Turing machines and computation
In 1935, at the age of 22, Turing invented Turing machines, abstract computing machines on which stored-program digital computers are modeled. Turing and Alonzo Church argued that every “effective” mathematical method can be carried out by the universal Turing machine.
In 1939 he joined the Government Code and Cypher School (GC&CS) at Bletchley Park, Buckinghamshire. During the Second World War, he was a leading participant in the breaking of the German cipher Enigma. According to Wikipedia, “The historian and wartime codebreaker Asa Briggs has said, ‘You needed exceptional talent, you needed genius at Bletchley and Turing’s was that genius.’” Turing extended existing work there and built a powerful device, capable of breaking Enigma messages.
“…You needed exceptional talent, you needed genius at Bletchley and Turing’s was that genius.”
Even though “artificial intelligence” was formally coined in 1956, researchers in England had been exploring “machine intelligence” for up to ten years prior to this. It was a common topic among the members of the Ratio Club (a group of young psychiatrists, psychologists, physiologists, mathematicians and engineers who met to discuss issues in cybernetics. Alan Turing was a member of this club.)
A machine that can learn from experience
In London in 1947 Turing gave a public lecture in which he said, “What we want is a machine that can learn from experience…[the] possibility of letting the machine alter its own instructions provides the mechanism for this”. This may be the earliest lecture to mention the idea of machine intelligence.
In 1948 he wrote (but did not publish) the first manifesto of AI; he called it “Intelligent Machinery’. In it he introduced many concepts that would later become important in Artificial Intelligence. One such concept was around “training” a network of artificial neurons to perform specific tasks. According to Jack Copeland, Turing’s director, Sir Charles Darwin, thought of it “as a school boy’s essay” and Turing never published his ideas!
In 1954 homosexuality was illegal. Turing was charged and given a choice between imprisonment or to undergo hormonal treatment. He started hormonal treatment. On 8 June 1954, Turing’s housekeeper found him dead. He had died the previous day. The coroner’s verdict was suicide.
Here are some interesting facts about Turing.
MANIAC: Running the first chess program
While taking a break from producing the first hydrogen bomb…
In 1946, mathematician John von Neumann took on the task of designing a powerful calculation machine. He came up with a computer architecture we now call von Neumann architecture. In 1951, based on von Neumann’s architecture, MANIAC (Mathematical and Numerical Integrator and Calculator) was developed under the direction of Nicholas Metropolis at Los Alamos Scientific Laboratory. Its first job was to perform the calculations for the hydrogen bomb. (In its free time, play chess.)
The MANIAC I chess program was written in 1956. The team that programmed MANIAC was led by Stanislaw Ulam (who invented nuclear pulse propulsion and designed the H-bomb with Edward Teller), Paul Stein, Mark Wells, James Kister, William Walden and John Pasta. Due to MANIAC’s limited memory, the program used a 6 × 6 chessboard and no bishops.
According to Chessprogramming, MANIAC I performed a brute-force Shannon Type A strategy. It performed 11,000 operations per second, and had 2,400 vacuum tubes. It took 12 minutes to search a four moves depth (adding the two bishops would take three hours to search at the same depth). The program was written in 600 words of machine code.
Here is a YouTube video clip on MANIAC (by AtomicHeritage).
Bernstein Chess Program: First complete program
In 1957, Alex Bernstein, an IBM employee, created the first program that could play a full game of chess. He created it with his colleagues Michael Roberts, Thomas Arbucky and Martin Belsky, Bernstein at the Massachusetts Institute of Technology.
The program ran on an IBM 704 and could perform 42,000 instructions per second. This was one of the last vacuum tube computers. It took about 8 minutes to make a move.
The Bernstein Chess Program used Shannon Type B (selective search) strategy. In order to select a move, it would look at the 30 possible moves available. It would then ask 8 preliminary questions about each of those moves. Next it would select 7 of the 30 possible moves for further analysis. Finally, it searched four plies (moves), considering it’s moves and the possible responses from the opponent for each of the 7 possible moves. It examined 2800 positions in 8 minutes.
International Master Edward Lasker played the program and easily defeated it, but he said that it played a “passable amateur game.”
Mac Hack VI: First program to compete (1967)
Meet Mac Hack (or Robert Q)
In the same year, Mac Hack VI became the first computer to play against humans under human tournament conditions; it entered the tournament as “Robert Q”. By the end of the 1967, it had participated in four tournaments.
The MacHack program was the “first widely distributed chess program”, running on many PDP machines. It became the first computer to reach the standard of average tournament players. Mac Hack VI defeated a human in a tournament and received a rating (Mac Hack VI played at the high Class C to low Class B level*). It was demonstrably superior to all previous chess programs as well as most casual players.
The number VI (six) refers to the PDP-6 (and here) (Programmed Data Processor-6) machine for which it was written. The program required 16K words of memory. PDP-6 was built by DEC (Digital Equipment Corporation), who gave the first prototype to Project MAC (MIT). The PDP-6 had a speed of about 225,000 instructions per second. It evaluated about 10 positions per second.
Search innovation: Transposition tables
By this time, the combination of the minimax algorithm and alpha-beta pruning significantly reduced the total number of branches of the tree that needed to be considered, which made it feasible to play chess on almost any computer, including the earliest microcomputers. Mac Hack VI, however, was a Shannon type B program that used forward pruning extensively along with the alpha-beta algorithm. Mac Hack VI was knowledge-based program, i.e. it used about 50 heuristics in establishing the plausibility of a given move. Being programmed with this considerable chess knowledge is one reason for the program’s quality of play.
In a famous match at MIT in 1967, Greenblatt’s program beat Hubert Dreyfus, who in 1965 had said “no chess program could play even amateur chess”. In 1967, Greenblatt’s Mac Hack VI was playing at the amateur level. According to Richard Greenblatt, “And so then as word got around- Well, there was a guy a MIT in those days named Hubert Dreyfus, who was a prominent critic of artificial intelligent, and made some statements of the form, you know, computers will never be any good for chess, and so forth. And, of course, he was, again, very romanticized. He was not a strong chess player. However, he thought he was, or I guess he knew was wasn’t world class, but he thought he was a lot better than he was. So anyway, I had this chess program and basically Jerry Sussman, who’s a professor at MIT now, and who was also a member of our group, had played. It was around and it was available on the machine. People played it, and so forth. And basically Sussman brought over Dreyfus and said, well, how would you like to have a friendly game or something. Dreyfus said, oh, sure. And sure enough, Dreyfus sat down and got beat. So this immediately got quite a bit of publicity.”
Dreyfus (1965): “No chess program could play even amateur chess.”
1967: Mac Hack VI beats Dreyfus.
Betting on machines
In 1968, a young Scottish International Master, David Levy, met AI researcher John McCarthy at a party hosted by Donald Michie. McCarthy challenged Levy to a game of chess and Levy beat him. Knowing Mac Hack VI’s quality of play, McCarthy said that a computer would be able to beat him within 10 years. Donald Michie and John McCarthy made a bet of £250 each with David Levy (who wagered £500 — the equivalent of more than £8,000 ($12,000) today).
In an interview Levy said, “You know, I was earning £895 a year…But I was so confident.” The following year Seymour Papert joined in, and in 1971 Ed Kozdrowicki joined in. In 1974, Donald Michie raised the total to £1250.
In 1978, he was playing a computer opponent in a match. He played five games: the first was a draw, he won the second and third, he lost the fourth, and he won the fifth. In 1978, Levy collected on his bet. He will go unbeaten until 1989 (by Deep Thought).
Are chess programs intelligent?
In his 1978 article David Levy said [referring to Alan Turing’s Turing Test as applied to chess], “If a chess master were to play a game against an opponent who was invisible to him, and could not tell from the game whether his opponent was a human or a program, then if the opponent was a computer program that program could be said to be intelligent. I have never faced with this situation but I doubt very much whether I, or any other chess master, could correctly identify an opponent that was a strong computer program. Although there are stylistic differences, these differences become less and less noticeable as the best programs become stronger….The best chess programs must now be considered “intelligent” in the Turing sense, though the brute force way that they play chess is anything but intelligent in human terms.” His last few lines:
“If it is possible to produce an artificially intelligent chess player it is surely possible to produce and artificial intellect in other spheres. My only qualm is that perhaps it is wrong to regard the prospect with awe. Maybe fear would be more appropriate.”
Cray Blitz: First super(chess) computer
In the early 1970s the first powerful programs were introduced
Cray Blitz, developed by Robert Hyatt, Harry L. Nelson, and Albert Gower, entered ACM’s North American Computer Chess Championship in 1976. It became a serious competitor in 1980 when it was moved to a Cray-1 supercomputer, becoming the first chess program to use such a powerful machine. This made it possible for Cray Blitz to adapt a mostly algorithmic and computational approach while still retaining most of its (extensive) chess knowledge. It participated in computer chess events from 1980 through 1994 and won two consecutive World Computer Chess Champions (1983 and 1986). This was the last general-purpose computer to hold the title of “world computer chess champion”.
Throughout the 1970s and 1980s the computer chess community was focused on beating David Levy (see Mac Hack and David Levy above). Levy was a very strong chess player and he was even better against computers. He knew how to exploit weaknesses of most chess programs of the time. His strategy was known as “do nothing, but do it well” and he would create positions which in which computers had no hope of finding good moves.
By 1984, Cray Blitz had beaten several strong masters and had achieved a master-level score. So Cray Blitz challenged David Levy — he beat it 4–0! After that he wasn’t so sure about his earlier optimism about computer play. In May 1984, he told Los Angeles Times, “During the last few years I had come to believe more and more that it was possible for programs, within a decade, to play a very strong grandmaster chess. But having played the thing now, my feeling is that a human world chess champion losing to a computer program in a serious match is a lot further away than I thought…”
“…Most people working on computer chess are working on the wrong lines. If more chess programmers studied the way human chess masters think and tried to emulate that to some extent then I think they might get further.”
They weren’t necessarily working on the wrong lines. Machines will begin to improve very quickly.
HiTech: First international master
“Hitech is inexorable — like Bobby Fischer.” — Hans Berliner
HiTech was a chess machine with special purpose hardware and software. It was built by Hans Berliner and others at CMU in the 1980s. The move generation and pattern recognition parts of its evaluation function were done in hardware — with 64 chips in parallel. Its custom hardware could analyze ~175,000 moves per second and it could execute a full-width depth-first search. It was one powerful machine.
The move generation and pattern recognition parts of its evaluation function were done in hardware — with 64 chips in parallel. Its search algorithm involved both alpha-beta as well as B*. Its custom hardware could analyze ~175,000 moves per second and it could execute a full-width depth-first alpha-beta search using a transposition table. It was one powerful machine.
In a 1985 LA Times article, Berliner said, “What sets it apart from other computers is we’re able to evaluate a position with a high degree of sophistication very, very quickly. I think we have a real chance to penetrate the very top levels. We’ll be in the top 50 players in the country by the end of next year.”
In 1985 it achieved a rating of 2530, becoming the first (and at the time only) machine to have a rating over 2400. It later tied with Cray Blitz at the 1986 Computer Chess Championship.
In 1988 HiTech defeated Grandmaster Arnold Denker in a four-game match (3.5–0.5), winning the last game in 23 moves. It was the first time a chess program had beaten a grandmaster. Arthur Denker said “This is a machine with a sharp game. It plays the openings beautifully. Unfortunately I played badly.” In 1988 HiTech became the first chess computer to rated Grandmaster strength.
According to Time, when asked if a computer would ever threaten the likes of Gary Kasparov, world champions (and others) said “No, I don’t think a computer will ever get that good.” Berliner acknowledged that Hitech was not that good — yet.
He said with conviction, “Ever is far too long a time.”
He was right, a computer would beat Gary Kasparov — soon enough.
Profile: Hans Berliner
“I shall always remember his face, dark and trembling, his aquiline nose and deep-set, bright eyes with their great sadness, the sensitivity of his hands…” Carlos Fuentes (link)
Father of computer chess.
Hans Jack Berliner was born on January 27, 1929 in Berlin, Germany (and died on January 13, 2017). His family was Jewish. In 1937, his family immigrated to the Washington, D.C. area to escape Nazi persecution. He learned to play chess at 13 at a summer camp and became a chess master at 20. He was a champion in correspondence chess, in which moves are exchanged via postcard. He said, “Chess was a wonderful way to discipline the mind. You’re forced to deal with a certain level of reality.”
In the early 1960s while employed at IBM, he was inspired by the work of pioneers in artificial intelligence and began writing a program to play chess. He then joined Carnegie Mellon University (CMU) at the age of 40 to earn a Ph.D. in computer science and later became a professor at CMU. While there, he pioneered hardware programming and built HiTech, the first machine that exceeded 2400 point rating. HiTech was weak in board evaluation so Berliner decided to explore the problem by working on an evaluation function for another game: backgammon. The result was BKG 9.8 (See Machines That Play (Backgammon)).
Some of Berliner’s PhD students went on to become important researchers in computer chess themselves. One of his students, Murray Campbell, was part of the Deep Blue team that would defeat Garry Kasparov in 1997.
Murray Campbell said “It would have been nice to say that computer chess led to a huge breakthrough that allowed us to better understand human language or translation, or led to a general model of human intelligence, but it didn’t. It did lead to a change of mind, a change in attitude, about how we approached a large number of problems in computer science. And in this field — and this is an important point — Hans Berliner produced something that had lasting value.”
Jonathan Schaeffer said, “Most scientists aren’t willing to take the kinds of risks that Hans would take. But that’s why his papers are still around, while the papers of his contemporaries are long gone and forgotten.”
Predecessor of Deep Thought, which would evolve into Deep Blue.
ChipTest was a 1985 chess computer built by Feng-hsiung Hsu, Thomas Anantharaman and Murray Campbell at Carnegie Mellon University (CMU). It is the predecessor of Deep Thought (and hence Deep Blue). ChipTest’s custom technology used “very large-scale integration” (VLSI) to combine thousands of transistors onto a single move generator chip.
Six weeks before the 1986 North American Computer Chess Championship, Hsu and team decided to prepare their machine to compete. According to Hsu, he had already tested and designed a custom chip that could process up to two million moves per second, 10 times faster than Hitech’s 64-chip array (HiTech was also a CMU program). Anantharaman had already written a toy chess program. He substituted Hsu’s chip tester for the software package he used in his program, and it improved the program’s speed by 500 percent, to a total of 50,000 positions per second! Hsu and Anantharaman recruited Campbell and Nowatzyk; Campbell worked on developing a more sophisticated evaluation function and all of them worked on augmenting the chip tester so that it could act as a simple search engine. The machine was buggy in the competition, but it was still a great achievement: on very little budget and in six weeks, they had built ChipTest.
In August 1987 ChipTest was improved and renamed ChipTest-M. It was ten times faster, searching ~500,000 moves per second.
Search Innovation: singular extension algorithm
In 1987, ChipTest won the North American Computer Chess Championship in a clean 4–0 sweep. For this, Anantharaman, who was the only person who understood the code he had written for the ChipTest, programmed the singular extension algorithm on his own. This was considered a major reason for a series of exceptional performances by Deep Thought.
Both HiTech and ChipTest were developed at CMU and shared some code so the two teams had become rivals. HiTech team had a head start, but Hsu and team would soon build ChipTest’s successor, Deep Thought, which would overtake them.
Deep Thought (1989)
“All right,” said Deep Thought.
“The Answer to the Great Question…”
“Of Life, the Universe and Everything…” said Deep Thought.
“Is…” said Deep Thought and paused.
“Forty-two,” said Deep Thought, with infinite majesty and calm.
“Forty-two!” yelled Loonquawl. “Is that all you’ve got to show for seven and a half million years’ work?”
“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.”
“But it was the Great Question! The Ultimate Question of Life, the Universe and Everything,” howled Loonquawl.
“Yes,” said Deep Thought with the air of one who suffers fools gladly, “but what actually is it?”
A slow stupefied silence crept over the men as they stared at the computer and then at each other.”Well, you know, it’s just Everything … Everything …” offered Phouchg weakly.
“Exactly!” said Deep Thought.
“So once you know what the question actually is, you’ll know what the answer means.”
Deep Thought was a “supernatural-computer programmed to calculate the answer the Ultimate Question of Life, the Universe, and Everything. After seven and a half million years, it came up with the result of 42, but didn’t know the question itself, so it commissioned the Earth to calculate that inquiry. Sadly, the Earth was destroyed five minutes before it was going to give the question.”
Oh wait, that was the other Deep Thought. This Deep Thought is also a computer and if it had carried out brute force only method, it could have also taken just as long to think.
Deep Thought was chess- playing machine initially started 1985 as ChipTest by Feng-hsiung Hsu, Thomas Anantharaman, Mike Browne, Murray Campbell and Andreas Nowatzyk. (It would culminate in Deep Blue).
Among the top 30 players in the US
In January of 1988, at a press conference in Paris, world chess champion Gary K. Kasparov was asked whether a computer would be able to defeat a grandmaster before the year 2000. “No way,” he replied, ”and if any grandmaster has difficulties playing computers, I would be happy to provide my advice.”
“No way…and if any grandmaster has difficulties playing computers, I would be happy to provide my advice.”
Ten months later, in November 1988, in the Software Toolworks Open Championship tournament in Long Beach, California, Grandmaster Bent Larsen was defeated by Deep Thought.
Larsen became the first international grandmaster to lose a tournament game to a computer under regular tournament conditions. Deep Thought won five other games, drew one and lost one, tying Grandmaster Anthony Miles for first place. The tournament money ($10,000) was won by Tony because machines were disqualified from winning money in tournaments. Even though Deep Thought played like a grandmaster, it could not be granted the grandmaster title because FIDE (French acronym for World Chess Federation) rules excluded “all games in which a computer participates”.
Deep Thought ended up winning the North American Computer Chess Championship in 1988 and the World Computer Chess Championship in the year 1989, and its rating, according to U.S. Chess Federation was 2551, among the top 30 players in the United States. An average tournament player then was rated around 1500. Deep Thought’s rating meant it was playing at the level of the bottom half of the grandmaster range.
AI and custom hardware
Deep Thought was a combination of software and customized hardware. It had a pair of custom built processors, each of which included a VLSI chip to generate moves. Each processor could evaluate 450,000 positions per second and the two processors working together reached 700,000 positions per second. These dual-processors were mounted on a single printed circuit board along with an additional 250 integrated circuit chips that controlled the search and evaluation positions. The evaluation function could weigh around 120 board features. These figures made it the fastest chess machine till then. In the games played after August of 1988, Deep Thought’s performance rating exceeded 2600.
In 1989 Deep Thought won the 6th World Computer Chess Championship with a perfect 5–0 score. It beat both the defending champion, Cray Blitz, and Hitech to become the new World Computer Champion. Crazy Blitz, the machine that had won two consecutive World Computer Chess Championships in 1983 and 1986, had eight processors capable of around 1.3 giga-instructions per second. It was able to conduct a nine-ply (nine turns), full-width search at tournament speed, but it still couldn’t match up to Deep Thought’s deeper search .
Challenging the world champions
Where does human chess and computer chess intersect? This question could be asked seriously in October 22, 1989 because Deep Thought was going to face Garry Kasparov in an exhibition two-game match.
Kasparov had beaten Anatoly Karpov in the fall of 1985. Kasparov was only twenty-two at the time, the youngest person ever to have become the world champion. Before Deep Thought, there had been no machine that might have challenged a human world champion.
Additionally a newer and more powerful version of Deep Thought was going to play that match; it had six-processors and was capable of searching more than two million positions per second (in 1986, the fastest computer was looking at roughly 150,000 different chess positions per second). Still, Campbell had set the odds of Deep Thought winning at 1 in 10.
Deep Thought’s rating was approximated to be around 2500 and Kasparov’s rating was around 2800, the highest rating in the history of ratings so far. Kasparaov won the first game, after which he briefly addressed the crowd and said,
“I can’t visualize living with the knowledge that a computer is stronger than the human mind…(to challenge the machine is to) protect the human race.”
Kasparov also won the second game, after which he said the machine still had “a lot to learn”. He was right. The result was not unexpected, but Deep Thought’s play was still considered to be disappointing.
At the end, Kasparov addressed the crowd and expressed a very graceful human sentiment:
“When playing versus a human being there’s energy going between us. Today I was puzzled because I felt no opponent, no energy — kind of like a black hole, into which my energy could disappear. But I discovered a new source of energy, from the audience to me, and I thank you very much for this enormous energy supply.”
On December 12 and 13, 1989, Deep Thought faced David Levy in a four-game match. Deep Thought defeated the master chess player David Levy. Until then, Levy had beaten all other previous computers since 1968. Deep Thought was represented and operated by Peter Jansen, who later told David 30 new evaluation terms had been added to Deep Thought’s program prior to its match with Levy. And Levy hadn’t competed seriously in chess for more than a decade so he played like a “rusty” man. Recall, in 1968, David Levy had met John McCarthy and Donald Michie. Based on the success of Mac Hack VI (and similar programs), McCarthy said challenged Levy that a computer would be able to beat him within 10 years.
Levy became the first master chess player to be defeated by a computer, but it didn’t happen until twenty-one years after the bet. And when it did finally happen, some called it the end of an era.
Evolution: ChipTest → Deep Blue (even more powerful hardware)
In 1989 IBM started working on Deep Blue (ChipTest → Deep Thought → Deep Thought II → Deep Blue). Speed was the key they focused on.
Regarding the next-generation machine, they said, “It should outcalculate its predecessor by a factor of at least 1,000. The machine we have in mind will therefore examine more than a billion positions per second, enough to search 14 or 15 plies deep in most cases and from 30 to 60 plies in forcing lines…To achieve this speed, Hsu is designing a chess-specific processor chip that is projected to search at least three million moves per second — more than three times faster than the current Deep Thought. He is also designing a highly parallel computing system that will combine the power of 1,000 such chips, for a further gain of at least 300-fold. Anantharaman and Campbell are improving various aspects of the current version of Deep Thought, so that these improvements can be incorporated into the next machine as well.”
So how good did they expect the next-generation machine to be? They said,
“If the observed relation between processing speed and playing strength holds, the next-generation machine will play at a 3400 level, about 800 points above today’s Deep Thought and 500 points above Kasparov’s rating record.”
They were building a machine to take down the best chess player ever — they were building Deep Blue.
In 1996, Kasparov would meet Deep Blue and say, “I could feel — — I could smell — — a new kind of intelligence across the table.”
Deep Blue (1996–1997)
Review: The road to Deep Blue
The idea of creating a chess-playing machine dates to the 18th century — much earlier than 1997. Around 1769, von Kempelen built the chess-playing automaton called The Turk, which became famous before it was exposed as a hoax. In 1912, Leonardo Torres y Quevedo built a machine called El Ajedrecista, which played the endgame, but it was too limited to be useful.
Creation of electronic computers began in the 1930s. ENIAC, which is considered to be the first general-purpose electronic computer, became operational in 1946. By the late 1940s computers were used as research and military tools in US, England, Germany, and the former USSR. Early AI researchers wanted to use computers for other engineering challenges that could both attract public interest to AI and allow them to solve complex problems.
Computer chess represented this exact engineering challenge. Programming a machine to play chess was not that tricky, but programming it to become a strong chess player turned out to be very hard.
1940s — 1960s
Computer chess presented a fascinating and challenging engineering problem. Chess is rich and complex with massive search space (more on this later). Chess mastery was considered to be an indicator of more general human intelligence. So it turned to be a useful testing ground. Researchers started making progress in the 1940s and in 1950 Shannon published “A Chess-Playing Machine,” (Scientific American, February, 1950), which set the direction for future AI researchers. In it he stated his case for why a chess-playing machine should be built, “The investigation of the chess-playing problem is intended to develop techniques that can be used for more practical applications. The chess machine is an ideal one to start with for several reasons.” We look at these reasons in a section below.
Early pioneers of the 1940s and 1950s focused on building machines that would play chess much like humans did, so early chess progress relied heavily on chess heuristics (rules of thumb) to choose the best moves. Researchers emphasized emulation of human chess thought process because they believed teaching a machine how to mimic human thought would produce the best chess machines. Computing power was limited in the 1950s, so machines could only play at a very basic level. This is the period when researchers developed the fundamental techniques for evaluating chess positions and for searching possible moves (and opponent’s counter-moves). These ideas are still in use today.
In the 1960s, AI pioneers Herbert Simon and John McCarthy referred to chess as ‘the Drosophila of AI’, which meant that chess, like the common fruit fly, represented a relatively simple system that could also be used to explore larger, more complex real-world phenomena. Computer chess was the perfect test-bed for AI research.
By the end of the 1960s, computer chess programs were good enough to occasionally beat against club-level or amateur players.
1970s — 1980s
In 1970–1980, the emphasis was on hardware speed. In the 1950s and 1960s, early pioneers focused on chess heuristics (rules of thumb) to choose the best next moves. Even though programs in 1970s and 1980s also used heuristics, there was a much stronger focus was on software improvements as well as use of faster and more specialized hardware. Customized hardware and software allowed programs to conduct much deeper searches of game trees (involving millions of chess positions), something humans did not (because they could not) do.
The 1980s brought an era of low cost chess computers. First microprocessor-based chess programs started becoming available. Because of the availability of home computers and these programs, anyone could now play chess (and improve their game) against a machine. By the mid-1980s, the sophistication of microprocessor-based chess software had improved so much they began winning tournaments — both against supercomputer based chess programs and some top-ranked human players.
In 1990s, chess programs began challenging International Chess masters and later Grandmasters. Some dramatic moments in computer chess occurred in 1989 — two widely-respected Grandmasters were defeated by CMU’s Hitech and Deep Thought. Researchers felt machines could finally beat a World Chess Champion. This got IBM interested so they began working on this challenge in 1989 and built a specialized chess machine, named Deep Blue.