Classic Urn Color Change Problem

--

Anyone who has taken probability class in their college or school is familiar with probability problems involving the classic urn (basically a bag) setup which contains balls of different colors and the professor asks you to find the probability of occurrence of some stupid pattern. This is one of those problems.

The problem goes like this…
You are given an urn with 100 balls (50 black and 50 white). You pick balls from urn one by one without replacements until all the balls are out. A black followed by a white or a white followed by a black is “a color change”. Calculate the expected number of color changes if the balls are being picked randomly from the urn.

Color Change explained for balls taken out of the urn one by one

If you have visited my solution for The Chessboard High Five Problem, you will know the 4 key points to approach almost any expectation problem on probability, except continuous r.v.s (about which I will discuss some other day). We will try to align our problem with respect to those 4 key points.

Let X be a r.v. which is the count of the number of color changes after all the balls have been picked out. Now, we will attempt to write X in terms of indicator r.v.s.

X is the sum of 99 indicator r.v.s where the ith indicator refers to the color change indicator between (i-1)th and ith ball chosen from the urn

Taking expectation of X, we get

Using Linearity of expectation and observing that only at 99 places color change can happen

How did I write the last step? Aren’t the indicator r.v. for the ith step influenced by the values of indicator r.v.s from the previous steps, as it is a situation ‘without replacement’? Hmm.. I had these doubts, indeed the probability of the ith indicator will change with i and will depend on the values taken by the previous indicators i.e. how many black balls and white balls are already taken out, but that is only when you are calculating the probability of the ith indicator being 1 ‘given the previous indicators’. The unconditional distribution of the ith indicator is same for all. Take some time to let this fact sink in. The following diagram might help you convince as well as help you calculate the value of the probability at the final step in the above expression.

Favorable cases for I_4 = 1

There are total 100! ways in which the balls can be picked, but for 1st indicator or 2nd indicator or ith indicator to be 1 the favorable cases for each i is same, can easily be seen from the diagram above it doesn’t matter that our choice was 4th indicator.

The solution hence, can be found out by simply plugging in the probability value resulting in the following

Expected number of color changes after all balls are drawn from the urn

Simulated solution will build confidence about whether the above solution is correct or not? As well as reveal some interesting cases, if any. The python code is as follows

import random
def countColorChange():
balls = [1 for _ in range(50)] + [0 for _ in range(50)]
random.shuffle(balls) # Shuffle the 50 black and 50 white balls
prev = -1
color_change = 0
for i in range(100): # Keep on picking the balls from the urn
pick_idx = random.randint(0, len(balls)-1)
curr = balls.pop(pick_idx)
if prev != curr and i > 0: # If there is a color change count it except for the first ball pick
color_change += 1
prev = curr
return color_change
N = 10000
print(f"Average Number of color changes in {N} rounds of ball picking: {sum([countColorChange() for _ in range(N)]) / N}")

The output of the above simulation gives,

Average Number of color changes in 10000 rounds of ball picking: 49.9753

Hope, you learnt something new even from the classic urn problem. Keep visiting for more interesting questions, puzzles and data science. If you have interesting puzzles on probability and statistics, you want me to write a detailed solution on, reach me out in the comments or any social media platform.

--

--

Rishi Dey Chowdhury (RishiDarkDevil)

Aspiring ML & Quant Researcher. I have keen interest in solving complex real life data-driven problems. I enjoy traveling, drawing, cooking and music.