The birthday problem — An excursion into probability
Suppose you are hosting a party. How many people would you have to invite so that the probability of at any two people sharing a birthday is at least 50%?The birthday problem, which is also sometimes called a “paradox” is a simple illustration of our intuitions about probability.
First of all, what does it mean when one says “the probability of at any two people sharing a birthday is at least 50%”? Let’s say the solution to the problem is 25 (which it isn’t). This would imply that if I invited 25 people to my party, there is a 50% chance that any 2 people share their birthday. But this is not really intuitive. A better explanation could be that if I were enough of a socialite to conduct 100 such parties with different sets of people every year over a period of 50 years, on average, in 25 of those years, I’d have met my target. Note that the operative phrase in the preceding statement is “on average”. It is even possible (although highly improbable) that I conduct 100 parties and none of those parties resulted in any birthdays being shared. How improbable is this?
In our 100 trials, let us represent the positive case with P and the negative case with N. The outcome of each trial T can only be P or N. Some valid sequences are —
PNPNPN…..
NPNPNP…..
NNNPPP…..
Since at every position we have two choices which are equally likely, there are 2¹⁰⁰ of such sequences, which is astonishingly large number (a number with 30 digits). Out of 2¹⁰⁰ sequences, exactly one sequence consists of only N’s. The likelihood of this happening is 1 / 2¹⁰⁰ which is a vanishingly small number. The symmetrically opposite case of conducting 100 parties and on all occasions having at least two people share a birthday gives a probability of 1 / 2¹⁰⁰. These are two extreme cases in a distribution which is known as the normal distribution, recognizable by the bell curve.
Before I discuss the solution , this might be a good time to try and solve it on your own.
.
.
.
.
.
How does one start solving this?
Imagine you have a row of 365 chairs which are numbered. Also assume that you somehow “randomly” generate a token between 1 and 365 and that the person who gets a token number needs to go to the corresponding chair. The first person to get a token can get one of 365 chairs. For the second person, there are two cases — either they get the same token as the first person or they get one of the 364 empty chairs. The third person gets the same token as person one or person two, or they get one of the 363 empty chairs. If you extrapolate this, you can see how if you have 366 people coming in, the 366th person will always get a chair which is already occupied. Stated in another way, with 366 people, the probability of two people getting the same token number is 100%. Please note that 366 is the minimum number of people that need to be considered to guarantee the condition that two people get the same token. You can of course have the second person sharing the same seat as the first, but this is not guaranteed!
The above is simply a restatement of the birthday problem. We assume that birthdays are distributed evenly over the year and that there are 365 days in a year. So if you have a group of 366 people, the probability that any two of them will share a birthday is 1. Our job is to start with two people and gradually increase this number till the probability climbs to 0.5. Let’s start the computation. If you have just two people, the probability that they share a birthday, given by P(n=2) is
P(n=2) = 1 — (364/365) which is equal to 0.0027 which is about 3 in 1000.
With three people,
P(n=3) = 1 — [ (364/365) * (363/365) ] which is about 8 in 1000
With four people,
P(n=4) = 1 — [(364/365) * (363/365) * (362/365)] is about 16 in 1000
If you extrapolate this way (you can see the results in the little program I wrote), by the time you reach 23 people, the probability is 1 in 2 or 50%. By 57 people, the probability is 99%! Here is a graph that I plotted using the output of the program —
As you can see, with around 60 people, you reach a very high degree of certainty that any two people in the group will share a birthday. This was a revelation to me because I was not expecting the likelihood to grow so quickly! It is clear from the graph that the likelihood does not grow linearly. There is an explosive growth initially, followed by a linear phase, finally ending with an explosive decline in the end. For me, this completely counter intuitive and very revealing.