AI: Monte Carlo Tree Search (MCTS)

Pedro Borges
4 min readAug 22, 2019

--

Monte Carlo Tree search is a fancy name for one Artificial Intelligence algorithm used specially in games. Alpha Go reportedly used this algorithm with a combination of Neural Network. MCTS has been used in many other applications before that.

Here I explain what algorithm is, and how it works.

What is Monte Carlo Tree Search

MCTS, like the name says, is a way of searching a tree. Within this tree, its nodes represent the states and the arc between nodes represents the choice that would take from one state to the other.

The image below shows the game of Tic-Tac-toe and how we can represent its possible states as a tree.

Example of a state tree for tic-tac-toe. Source

The Monte Carlo part of the name is what differs from other kinds of tree searches. Monte Carlo algorithms are a broder class of probabilistic algorithms. That means that the algorithm normally returns only an approximation of the actual result. But in several cases it is proven to converge to the real result when ran up to infinity.

The classical example of a Monte Carlo algorithm is the one used to approximate the number Pi. For that, draw a square with each side of size 1 and inscribe a circle in the square. The area of the square should be equal 1 while the area of the inscribed circle should be equal to pi*(1/2)². Then the ratio between the area of the square and the area of the circle should be 4/pi. Therefore, to estimate PI, we can place points at random within the region following a uniform distribution. The ratio of the points that fall inside the circle over the total points will give the approximation of the ratio of areas of the two figures. That in turn will give the approximation of the number PI.

The figure bellow is a representation of the algorithm to estimate Pi, but using only a quarter of circle.

Estimation of PI by using the ratio of areas of the square and circle using a Monte Carlo method. Source

How MCTS works

The algorithm can be described in 3 steps:

  1. Selection: From the current state of your game, go down your tree of states of already explored states. There are some ways of choosing which of the already explored states to be chosen, such as the Upper Confidence Bount (UCT), but it will not be discussed here. When we reach a state that we have not been to before, we go into the expansion phase
  2. Rollout: From this state we have not seen before, simulate the successive states until a Win/Draw/Loss has been reached. In a pure Monte Carlo method, the state chosen to be explored next can be picked at random. Therefore we can continue choosing random moves until the game reaches an end. After reaching the end we have to do the backpropagation of the result.
  3. Backpropagation: This is performed after we finished one rollout, so after we reach a state of Win/Draw/Loss. We update the value of Win/Loss/Draw of any state that we have been through until we have reached this terminal state.

We can execute the above steps for as long as we have time. Then when it is time to choose a move, we can choose the one with the highest Win ratio.

The picture below shows an example of one iteration of the steps above

One iteration of MCTS. Source

In the picture, white nodes represent that it is a state that it is player 1’s turn. Black nodes represent the converse, where it would be player 2’s turn. The number in each node represent the number of wins over the total number of rollouts that have been through that node.

Implementations

If you would like to try, I recommend checking one of these libraries

Python: https://github.com/int8/monte-carlo-tree-search

Javascript: https://github.com/dbravender/mcts

--

--