Optimal placement of boxes in a container: Reinforcement Learning with Markov Chain Monte Carlo Search Tree in Python, Part 2

3 min readAug 15, 2019

In Part 1 of this series we talked about Monte Carlo Tree Search (MCTS) and how it relates to our problem at hand. Now we will implement it in python.

The most important steps of MCTS is Selection, Expansion, Simulation and Backpropogation. Let’s see how we can implement those…

The full code can be found on my GitHub

At each iteration, we start by selecting a node from the root.
Then we rollout the regular policy and get the total reward via getCurrentCBM method in container class
Then register it to our tree node…

At selection, If the node is not a leaf, then it checks if all the options under it has already been added as nodes (isFullyExpanded),
If so return the best node (check Part 1 about the explorationConstant).
If not fullyExpanded, then expand this node.

At expansion state, we get a random box out of all remaining
Add it as a new child node under the parent.

While getting the best leaf, we first determine which child has the most rewards calculated so far and return that.
But as explained in Part 1, we need a little bit of exploration — exploitation So with the exploration constant we determine if we are returning the most rewarding child here, or a random one to allow the tree to build more paths to consider and compare.

Putting it together:

Visualizing the Results:

Running this model 25,000 times with an initial 300 boxes and 0.25 exploration constants.

The model was able to fit 14.33 CBM with 207 boxes.

The sequential business logic fitted only 12.48 CBM with 177 boxes total.

The random business logic fitted only 12.27 CBM with 174 boxes total.

That’s about 15% improvement on the current business policy with around hundred lines of code, not bad at all!

Numbers are all good, but can we visualize this? The answer is yes. I used Plotly to render a container and boxes as the model saved in the json file. Each box has its own color to make it easy to differentiate.

Once we render both the optimized and non-optimized placings, we can see the difference as well…

Conclusion:

We implemented Monte Carlo Tree Search to address a real world problem and with the simplest implementation, we were able to provide 15% improvement on current business policy. There are more optimizations that can be made to further improve the results which we will be working on next with our clients.

I enjoyed writing this article as it gave me a much better undertanding of the MCTS than just reading it. As you know the best way to learn something, is to do it yourself! I hope you enjoyed this article as much as I did. Comments and questions are welcome!

Optimal placement of boxes in a container: Reinforcement Learning with Markov Chain Monte Carlo Search Tree in Python, Part 2

Written by judopro