ByteDance’s Monte Carlo interview question - revisited

6 min readJun 5, 2020

Author Homo Sapiens recently posted an interesting story on ByteDance’s interview question. The question asks the applicant to find the probability of an unlikely event and implies the use of Monte Carlo methods with the added challenge of a limit on the number of iterations possible. Before reading this story, I encourage you to read Homo Sapiens article first here.

This article builds on Homo Sapiens story, contributing another method to solve the problem. A further ‘hierarchical’ Monte Carlo simulation is performed which compares the two methods. All code used in this article is found here.

Now that that is all cleared up, lets define the problem.

10 small balls are randomly divided into 12 boxes. You need to find the probability that exactly 10 boxes are empty with a program. The program is required to simulate 100,000 times in order to calculate the probability with “brute-force”.

The solution to this problem can be determined via traditional probabilistic methods and is shown below and is shown to be ~1.0894*10⁻⁶. It is a good idea to do this first so we can benchmark our Monte Carlo simulation performance in the future to the “correct answer”.

The probability that exactly 10 boxes are empty is ~ 1.0894 x 10^-6.

Of course the question posed to the prospective applicant is to arrive at this result via a computational simulation. Naturally, the first approach will be to adopt a naive Monte Carlo simulation without any intelligent hacks or adaptions. Unfortunately, it will quickly be found that almost every attempt will result in a 0 probability every time.

Naive implementation. Note randint takes the range input as inclusive (i.e. [0,11])

To explain why this 0 probability result occurs when it is know that the answer is in-fact non-zero, we must consider the expected number of success of the event over the 100'000 iterations. Firstly, let us define A as the event of exactly 10 of 12 boxes being empty and define N as the random variable describing how many success of A will occur in 100'000 iterations of the experiment.

The expected number of events A to occur in 100'000 iterations is much less than one.

The number of times we would expect to see the event A occur is much less than one. This is a problem, as even if the event does occur, it will not result in an accurate probability. Consider the scenario where the event A occurs exactly once. In this case, the resulting probability from our simulation would be 1/100'000 or 10⁻⁵. This is a factor of 10 off where the result should be.

The obvious solution here is to increase the number of Monte Carlo iterations to allow more opportunities for the event to occur. Unfortunately, we are bound by the question and cannot have more than the allocated 100'000 iterations.

Author Homo Sapiens proposes an interesting approach to solving this problem via the introduction of a new event B. Both A and B are (re)defined as follows:

A: 10 small balls are randomly divided into 12 boxes such that exactly 10 boxes are empty.
B: 5 of the 10 small balls are randomly divided into 12 boxes such that >= 10 boxes are empty.

Homo Sapiens outlines that P(B|A) = 1 and follows with the corollary: P(A) = P(A∧B) = P (A|B)P(B). This is a smart method of breaking the problem into two separate tasks (i.e. finding P(A|B) and P(B)) where each task has a higher probability than P(A) and will not suffer from the aforedescribed problem of low expected event occurrences.

Unfortunately, conducting Monte Carlo experiments for the A|B simulation is non-trivial. This event can be described in words as follows:

Given 1 or 2 of 12 boxes contain small balls, 5 small balls randomly divided into the 12 boxes will result in exactly 2 boxes containing balls.

Sampling this conditional event is ambiguous. A work around is using a record of end states as given after conducting the P(B) Monte Carlo simulation. Unfortunately, there are typically very few usable state realisations available and potentially not enough to accurately model the system conditions for the conditional Monte Carlo simulations. Besides this, because two separate simulations are required here, the iteration runs must be split between the two tasks.

Instead, let us define two new events as follows:

C: 5 of the 10 small balls are randomly divided into 12 boxes such that exactly 10 boxes are empty.
D: 5 of the 10 small balls are randomly divided into 12 boxes such that exactly 11 boxes are empty.

We can now adopt the theorem of total probability (TOTP) in calculating the probability of A via the conditional probabilities with C and D. Notice that we now have 4 distinct probabilities to calculate now instead of 2. The main difference is that the simulation of the conditional probabilities, P(A|C) and P(A|D), is far easier and more importantly, can be calculated in parallel.

P(A) can be described via the theorem of total probability. Note the P(A|C’)P(C’) and P(A|D’)P(D’) terms are excluded as they have a 0 probability.

Computing in parallel permits us to use all 100'000 iterations to generate the component probabilities instead of ‘splitting’ the runs to generate the probabilities in-turn. Furthermore, performance is also increased as only 5 balls are ever thrown in our simulation! There is also no need to store in memory any historical states. This is ideal as memory fetching is slow in Python!

In the code above, note how we only need to work from one array when simulating ball throwing. We can hack the result for each probability object of interest by making either 0,1 or 2 boxes certainly not empty. In this implementation, we always make the same boxes not empty. This does not effect the result.

After finishing completing the ‘TOTP’ implementation, I was curious to see how it ranked up to Homo Sapiens’ original solution (henceforth referred to as ‘H-Sapien’). Interestingly, a hierarchical Monte Carlo experiment can be conducted to test relative performance. This can be mind-bending to think about. Ideally, every Monte Carlo simulation we run, returns the same result given that sufficient iterations are undertaken. In our scenario, only 100'000 iterations are provided. Given that this is indeed not enough iterations, each result of a Monte Carlo simulation will be variable. We wish to find the implementation that is more likely to give us an answer closer to our analytical solution.

We define two new random variables, T and H, describing the output of the TOTP and H-Sapien implementation respectively. Each realisation of T and H are presumed Gaussian and are modeled below. For the experiment, 3000 iterations of each implementation are used. The implementation of H-Sapien is taken directly from the story as found here.

PDF of T and H. The expected value of both implementations of the MC simulations are the same, but the variance is lower for TOTP implementation.

Mean squared error between the MC estimate of P(A) and

Interestingly, we can observe that for both Monte Carlo implementations, the same and correct expected outcome is achieved. The TOTP implementation results in a lower variance, meaning that one is less likely to get an outlier from the implementation. Unsurprisingly then, the mean squared error (MSE) of the TOTP implementation is also lower.

Thank you for reading, if you have any feedback, I would love to hear! Thanks to Rob for looking over the code for me and thanks to author Homo Sapiens.

ByteDance’s Monte Carlo interview question - revisited

Written by Barber Alec