Image for post
Image for post

Learning Machine Learning: Roulette with Monte Carlo Policy

Gary Butler
Oct 24, 2018 · 4 min read

Every week or so I push myself with a new deep learning or machine learning problem. I’ve been coding ML daily for 124 days now. This weeks’ challenge was to practice what I’ve been learning in Move 37, the deep reinforcement learning course offered for free by Siraj Raval. We’ve been covering Monte Carlo Methods and have seen an example using OpenAI gym’s blackjack environment. So Let’s get right down to it and code roulette with Monte Carlo technique.

Imports, gym for roulette, numpy for math, and matplotlib to graph the results. Make our blackjack environment. Set the Epsilon to 5% of the time choose a random action. Set the Gamma to 1, we will not be considering possible future rewards since there are no guarantees in roulette. Initialize everything else to 0 or empty and sized according to the OpenAI gym roulette documentation.

Randomly initialize our policy.

A million episodes of training should be plenty to get a good policy trained. Initialize some variables to use each episode. Check our progress every hundred thousand episodes. Reset the environment.

Until the game is done, take an action based on our Monte Carlo policy, and record the results. An action would be to place a bet or get up from the table.

Step back through the memory to record the rewards based on the previous state/action pairs.

This next part is where the Monte Carlo decision process happens. It can look intimidating, but I’ll try to explain. We are going through every state/action pair and the reward for taking that action while in that state. If a state/action pair has not been visited before we are comparing the rewards from other actions from that state to determine the best action from that state. We are choosing a random action in case of a tie. In the beginning, it will be more likely to take a new action, but over time the epsilon will diminish and we will be taking only the already known best actions.

Test our trained Monte Carlo policy.

Print the results.

The results are clear, zero wins and zero losses. The best way to win roulette is to not play at all. Cheeky Monte Carlo policy decision to leave the table every game. It is correct. The house always wins, it’s a losing game. Just for fun lets see what happens if we force our Monte Carlo policy to play by changing one line of code.

Remove the option of getting up from the table.

Now let's see what happens.

Wins 2.5% of the time, losses 97.5% of the time. The first run was clearly on to something. This week I’ve learned not to play roulette, but more importantly, I’ve learned how to use Monte Carlo policy to solve problems with unknown variables.

Thanks to Siraj Rival for the free deep RL course and thanks to youtube Machine Learning with Phil for helping me understand Monte Carlo policy in OpenAI gym.

Data Driven Investor

from confusion to clarity not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store