Another Approach to Solve Urn of Mystery

Jeremy Song
Stochastic Stories
Published in
3 min readDec 19, 2018

Urn of Mystery is an example in the book How to Measure Anything: Finding the value of Intangibles in Business. Douglas (the author) uses this example to convey a rule: The Single Sample Majority Rule.

The Single Sample Majority Rule

Given maximum uncertainty about a population proportion — such that you believe the proportion could be anything between 0% and 100% with all values being equally likely — there is a 75% chance that single randomly selected sample is from the majority of the population.

Urn of Mystery is a contrived example to prove this. Suppose we have a warehouse full of large urns. Each urn is filled with marbles and each marble is either green or red. The percentage of marbles that are green can be anything between 0% and 100%, and all percentages are equally likely. The remaining portion of the urn is red marbles. We draw one urn at random from the entire warehouse. If we draw one marble from the urn and it’s green, what’s the probability of the majority of the marbles in this urn are green?

The answer is 75% and the author uses a solution that involves simple math (with Bayes’ theorem) to get this answer.

Another Approach

Here I would like to propose a different way to solve this problem — a pure Bayesian approach. This solution involves a little bit more math as we will use Beta distribution and conjugate distributions. However, this solution is more general.

The question “What’s the probability of the majority of the marbles in this urn are green” can be decomposed into two questions:

  1. What’s the distribution of the percentage of green marbles in the urn, given that we draw one green marble from it?
  2. What’s the probability that percentage is greater than 0.5 (being majority)?

The setup:

  • Let θ be the percentage of marbles that are green in the urn
  • Let X be the observation of the draw.
  • Since we draw one urn at random from the entire warehouse, we can safely assume the prior distribution is uniform distribution: U(0, 1).
  • We draw one marble from the urn. The probability of that marble being green approximately follows bernoulli distribution: Bern(θ).

According Bayes’ Theorem:

P(θ|X=Green) ∝ P(X=Green|θ)P(θ)

U(0,1) can be written as beta distribution Beta(1,1). Thus,

P(θ|X=Green) ∝ Bern(θ)·Beta(1,1)

In this case, Bernoulli distribution and Beta distribution are conjugate distribution with Beta distribution as prior. This means that P(θ|X=Green) is a Beta distribution too. With some math, we can get P(θ|X=Green) ~ Beta(2,1).

So far we solve the question 1. We now know that the distribution of the distribution of the percentage of green marbles in the urn (given that we draw one green marble).

To solve question 2, all we need is to figure out: P(0.5 < θ ≤ 1| X = Green) = 1- P(0 ≤ θ ≤ 0.5|X = Green), which can be done in python easily:

>>> from scipy.stats import beta
>>> a = 2
>>> b = 1
>>> 1 - beta.cdf(0.5, a, b)
0.75

The author also calculates the probability of the green marble being majority if we draw more marbles. This can be easily solved using the approaches discussed above. For example:

  • If the second draw is green marble: 1 — beta.cdf(0.5, 3, 1) , which is 0.875.
  • If the second draw is red marble: 1- beta.cdf(0.5, 2, 2) , which is 0.5.

Prior is important

In this problem, our prior distribution is uniform distribution, which is Beta(1,1), which is why a single data point could change the probability from 50% to 75%.

But think about another similar problem: if this is not urn, but a coin that is randomly draw from a mint. You flip the coin once, you get a head. Will you say that the coin has 75% probability of biased to head? Probably no.

In the case of flipping coin, we usually don’t assume that the prior is Beta(1,1). Instead we probably choose Beta(100, 100) or Beta(1000, 1000). When we choose Beta(1000,1000) as prior, we basically assume that someone already flipped the coin from that mint 1998 times and they got 999 heads and 999 tails, even though their initial prior was Beta(1,1). If we flip a coin and got a head, the probability of the coin is based to head is 1 — beta.cdf(0.5, 1001, 1000) , or 0.5089, which is hardly biased at all.

--

--

Jeremy Song
Stochastic Stories

I am currently a Principal Software Development Engineer at Amazon. All opinions are my own.