Monty Hall Done Right

The Monty Hall problem is solved with an arbitrary number of cars and goats using conditional probabilities. As a bonus, I include a colab notebook that solves this problem by a multi-armed bandit strategy.

Borun Chowdhury Ph.D.
Analytics Vidhya
5 min readDec 8, 2021

--

Theoretical Derivation

There is no dearth of Monty Hall problem descriptions. However, I think that looking at the same problem of 1 car and 2 goats repeatedly one gets bogged down by the details of that particular example and am reminded of Bruce Lee quoting a Zen parable in the Enter the Dragon saying

It is like a finger pointing away to the moon, don’t concentrate on the finger or you’ll miss all the heavenly glory

So here I solve the Monty Hall problem with an arbitrary number of cars and goats and use conditional probabilities math. Furthermore, I leave the host behavior most general till the last step so that the difference between an ignorant and an all knowing host can be highlighted.

https://unsplash.com/photos/3c_EjT6uAFo

This is not particularly hard to follow if you know conditional probabilities but otherwise will likely cause unnecessary ulcers. You have been warned!

Now lets take a situation where there are nc cars and ng goats and the setup is that you choose a door at random. The host throws open one of the other doors revealing its contents. Should you switch?

So we want to know

the probability of winning a car if we switch.

Using conditional probabilities this is

where the notation P(C|S,C,G), for instance, means the probability to get a Car (C ) if you happened to have chosen a door with a goat (G) in the beginning and then the host threw open a door with a car (C ) and then you switched (S). The conditional part of the probability should be read right to left for notation (the actual chronology does not matter for the computation). The other probability P(C,G) is that of you having chosen a door with a goat (G) behind it and the host throwing open a door with a car (C ) behind it.

The above expression can be further simplified by use of relations like

which says that the probability of you choosing a goat (G) and the host throwing open a door with a car (C ) behind it is the product of probabilities of you choosing a goat P(G) and the host choosing a car (C ) post facto you choosing a goat. I use the symbol π(C|G) for it as its an action by the host but the math doesn’t care about that.

With this we get

Finally using the fact that we have nc cars and ng goats we have

where we have defined n = nc+ng.

Up to here we have the most general situation with the only constraint being that the host opens only one door. This is where the Monty Hall problem comes in. If the host doesn’t know what’s behind any of the doors then the probabilities of opening any of the doors you have not picked is equal giving π(G|G) = (ng-1)/(n-1), π(C|G) = nc/(n-1), π(G|C) = ng/(n-1) and π(C|C) =(nc-1)/(n-1) and if you do the math you will get P(C|S) = n_c/n as expected.

However, if the host knows what’s behind each door and purposefully opens a door with a goat behind it then we have π(C|C) = π(C|G) =0 and π(G|G)= π(G|C) =1. With these we get

which for nc=1 and n=3 gives the famous result 2/3. Note that n≥3 and nc≤n-1 (because the host picks a goat necessarily) and with these the above expression P(C|S)<1 as should be. In fact we can write

where the first part on the right hand side is the probability of P(C|S) when the host doesn’t know what’s behind each door. Since n-1 > n-2 we see that it is always profitable to switch if the host selects a door with a goat behind it on purpose.

Multi Armed Bandit

If we have an agent who does not understand conditional probabilities or simply does not like math and would just like to learn to switch or not based on explore and exploit, she could play this game repeatedly while exploring by sometimes switching and at other times not to understand the reward distribution. At the same time, not only interested in the academic knowledge of the distributions, she wants to exploit this knowledge to get many BMWs, Ferraris and Rolce Royces. She has to balance exploration (switching or not) with exploitation (performing the action that wins her more cars). This is the classic problem of multi arm bandit.

The notebook for the analysis can be found here.

Below is the plot for average reward (goat=0 and car=1) by the agent that uses an epsilon greedy approach to switch or not. The dashed lines are the theoretical values I derived above.

Multi Armed Bandit solution to the Monty Hall Problem

Conclusion

To highlight the actual crux of the Monty Hall problem, I solved it for arbitrary number of cars and goats. The result reduces to the famous value of 2/3 when the number of cars and goats is 1 and 2 respectively. Furthermore, by writing the expression with the role of the host explicit, I showed how the ignorant host leads to the answer most people naively expect and how the true answer is related to the naive ignorant Monty solution. Finally, I provided a link to a colab notebook that solves the problem using a Multi Armed Bandit strategy and confirm the theoretical results match the results from the simulation.

--

--

Borun Chowdhury Ph.D.
Analytics Vidhya

Formless and shapeless pure consciousness masquerading as a machine learning researcher, a theoretical physicist and a quant.