AI: Monte Carlo Tree Search (MCTS)
Monte Carlo Tree search is a fancy name for one Artificial Intelligence algorithm used specially in games. Alpha Go reportedly used this algorithm with a combination of Neural Network. MCTS has been used in many other applications before that.
Here I explain what algorithm is, and how it works.
What is Monte Carlo Tree Search
MCTS, like the name says, is a way of searching a tree. Within this tree, its nodes represent the states and the arc between nodes represents the choice that would take from one state to the other.
The image below shows the game of Tic-Tac-toe and how we can represent its possible states as a tree.
The Monte Carlo part of the name is what differs from other kinds of tree searches. Monte Carlo algorithms are a broder class of probabilistic algorithms. That means that the algorithm normally returns only an approximation of the actual result. But in several cases it is proven to converge to the real result when ran up to infinity.
The classical example of a Monte Carlo algorithm is the one used to approximate the number Pi. For that, draw a square with each side of size 1 and inscribe a circle in the square. The area of the square should be equal 1 while the area of the inscribed circle should be equal to pi*(1/2)². Then the ratio between the area of the square and the area of the circle should be 4/pi. Therefore, to estimate PI, we can place points at random within the region following a uniform distribution. The ratio of the points that fall inside the circle over the total points will give the approximation of the ratio of areas of the two figures. That in turn will give the approximation of the number PI.
The figure bellow is a representation of the algorithm to estimate Pi, but using only a quarter of circle.
How MCTS works
The algorithm can be described in 3 steps:
- Selection: From the current state of your game, go down your tree of states of already explored states. There are some ways of choosing which of the already explored states to be chosen, such as the Upper Confidence Bount (UCT), but it will not be discussed here. When we reach a state that we have not been to before, we go into the expansion phase
- Rollout: From this state we have not seen before, simulate the successive states until a Win/Draw/Loss has been reached. In a pure Monte Carlo method, the state chosen to be explored next can be picked at random. Therefore we can continue choosing random moves until the game reaches an end. After reaching the end we have to do the backpropagation of the result.
- Backpropagation: This is performed after we finished one rollout, so after we reach a state of Win/Draw/Loss. We update the value of Win/Loss/Draw of any state that we have been through until we have reached this terminal state.
We can execute the above steps for as long as we have time. Then when it is time to choose a move, we can choose the one with the highest Win ratio.
The picture below shows an example of one iteration of the steps above
In the picture, white nodes represent that it is a state that it is player 1’s turn. Black nodes represent the converse, where it would be player 2’s turn. The number in each node represent the number of wins over the total number of rollouts that have been through that node.
Implementations
If you would like to try, I recommend checking one of these libraries
Python: https://github.com/int8/monte-carlo-tree-search
Javascript: https://github.com/dbravender/mcts