How Many Flips of a Coin does it Take to get Nine Heads or Tails in a Row?

With a bonus surprise ending!

Published in

The Startup

14 min readSep 3, 2019

My son was studying computer programming, and he got the idea to use the computer to see how many random flips of a coin it took to get a certain number of heads or tails in a row. For example, how many random flips did it take the computer to get 7 heads or tails in a row, or 10 heads or tails in a row.

(For my non-U.S. readers, U.S. coins generally have a portrait of a head on one side (“heads”), and some other image on the opposite side (“tails”)!

Since my son’s math class was also studying standard deviation, I thought it would be fun to apply a standard deviation analysis to the coin-flipping results.

I wrote a program that started flipping a coin, and counted the number of flips it took until it got a sequence of nine heads or nine tails in a row. (The theoretical number of flips expected to get nine heads or tails in a row is 511.
I won’t go through that proof in this paper.)

This program took the number of flips it made before getting nine in a row, wrote that number to a disk file, and then started over with another set of flips. Here is some data from the beginning of the file, ten entries:

735
55
217
184
164
1632
176
389
1307
353

Thus, the first time it started flipping, the program took 735 flips before it saw nine in a row. The second time it took only 55 flips, etc. Above we see ten entries. I let the program run until it had about 133 million entries!

Now for the standard deviation stuff.

Looking through the file of 133 million numbers, I initially thought the numbers would generally center on the theoretical result of 511, and in fact the average of the 133 million numbers in the main output file was, to four decimal places, 511.0030.

However, perusing the short list of ten numbers above, they look like they’re varying all over the map. So, I thought I’d create another file, where the numbers vary less — where each number was closer to the theoretical 511. I did that by grabbing a bunch of numbers at a time out of the 133-million-entry file and averaging them.

I went through the big file of 133 million numbers. I grabbed the first 50 results — the ten shown above and the next forty — and averaged them, and wrote that average as the first number in a new file. Then I took the next 50 results, and averaged them, and wrote them as the second entry in the new file. With 133 million “raw” entries, I was able to pluck out 50 numbers to average just over 2.6 million times. So now we have 2.6 million entries, each of which is the average of 50 of the results from the big 133-million-entry file.

This chart plots those 2.6 million data points:

Across the abscissa (the horizontal axis), we see the average of 50 numbers taken from the big data file. The ordinate (the vertical axis) shows how frequently that horizontal-axis number showed up in the list of 2.6 million averages. The curve does seem to peak around the theoretical result of 511. Expected, right? The apex of the curve is just above the horizontal “14,000” line. Thus, out of our new collection of 2.6 million averaged results, about 15,000 times we saw averages of about 511 flips to get nine in a row.

The standard deviation for this curve is 71.17. It doesn’t look like a classic Bell curve — it’s not a “normal” distribution — but the standard deviation gives us a feel for how “tight” the averaged number of flip entries are.

Next, I started over with the huge raw file, but this time, I took 500 results to average for each of our new data points. We expect these averages to be much more uniform, closer to the theoretical 511, because we are averaging ten times as many raw results for each data point. We expect the drawn curve to crowd closer to the theoretical average than when we only took 50 numbers to average for each data point. And that’s just what we find:

As expected, the chart is peaking better. The standard deviation is now 22.6, less than half of what it was in the previous chart, indicating that the curve is tighter — fewer outliers. For this run, I was able to plot
133 million / 500 = about 260,000 data points, or “events” as I call them in the chart.

The curve seems to be tightening up nicely.

Note something else: the apex of the peak of this chart is at 509 flips to get nine in a row. I didn’t print it on the previous chart, but when we made that previous chart, averaging 50 results for each data point, the peak of the chart was at 501. Even farther from the theoretical mean of 511. Why is that? Is it a clue? Something we’ll find out about later? Maybe.

Then I again started over with the huge file, but this time I took 5000 results at a time to average. Five thousand raw numbers averaged for each data point. As expected, the peak tightens further, around the theoretical expected result of 511, and the standard deviation drops again, to 7.18:

And finally, I took 50,000 entries for each average. Here’s the resulting chart:

The standard deviation is now only 2.30. You can see there are a fewer total number of events, because I had to pluck 50,000 raw data points out of the original 133 million-event file to make each data point in the above chart. And the peak of the chart is now right at the theoretical mean of 511.

Okay, so as I said, I thought it would be fun to see how these charts tighten up as we average larger and larger numbers of results for our data points, and we find that’s exactly what happened. The results above were totally expected.

For some reason, though, a few days after I made these charts, I decided to make another chart, where I just plotted the raw data from the huge 133 million data-point file, without pulling out bunches of data points and averaging them. Just make a chart like those above, but for the raw data file.

I figured it would be very much like the first chart — a wider, flatter curve — but even much flatter, just an unimpressive mound, maybe something like this:

… with a correspondingly much larger standard deviation.

I ran the program to create this final chart. But then I got a big surprise. Maybe it won’t be a surprise to you, but it was a big surprise to me. The chart didn’t at all resemble the other charts, but just with a flatter curve. Here’s the new chart:

What? Where’s my nice flat curve centered on 511? This curve — visually at least — seems to ignore the theoretical (and experimental) mean of 511.

To begin our explanation and analysis, we’ll take an expanded look at the extreme left side of this chart — one through twelve flips.

On the extreme left side of the chart, the number of data points (the y-axis) is zero for 1, 2, 3, 4, 5, 6, 7, and 8 flips. Why? Because you can’t get nine heads or tails in a row if you have fewer than nine flips!

Then, for exactly nine flips, there were 518,215 entries in the big data file. Out of 133 million trials, the computer hit nine in a row in the first nine flips 518 thousand times.

For exactly ten flips: 259,305. Curiously, that number is almost exactly half the number of entries with nine flips. The number of data points for eleven and twelve flips seems to be the same as for ten flips: the line just heads out horizontally to the right.

However, going back up to the non-expanded chart, as the number of flips to get nine in a row increases (the abscissa), the frequency of success with that number of flips decreases steadily and gradually (the ordinate).

I didn’t believe it. I checked the program a hundred times, thinking there was an error:

To me, it was totally unintuitive that nine flips to get nine in a row is by far the most common result;
why is this chart so different from the nicely-peaked charts on the previous pages;
how did this chart turn into those previous charts — with their nice peaks around the theoretical average of 511 — when the results were grouped and averaged?

Need answers. It was time to put away the computer and get out pencil and paper.

What are the chances of flipping nine heads or tails in a row, in only nine flips? The first flip can be either heads or tails. Then, the second flip must match the first — heads if the first was heads, and tails if the first was tails. The chances that the second flip will match the first are 50%, or 1/2. Similarly, the third through ninth flips — those seven flips — must all match the first flip, and the chances of each one of those seven flips matching the first is 1/2. So the chances of getting nine identical results in nine flips is one over two to the eighth power: 1 / 2⁸ .

If the chances of getting nine in a row in exactly nine flips is 1/2⁸, then how many times would we expect that to happen in the large raw data file? There were 132,843,101 entries in the big file. That number divided by 2⁸ equals a theoretical result of 518,918, very close to the experimental 518,215 result we see in the chart.

Here’s another way to think of it. Basic probability tells us that the chances of an event happening are the number of ways that success can happen, divided by the total number of possible outcomes.

There are two ways to have success in getting nine-in-a-row in nine flips:

T T T T T T T T T
H H H H H H H H H

There are 2⁹ total possible permutations of nine flips — that many total possibilities. So the chances of success in nine flips are:
(number of ways to have success) / (total number of possibilities),
which is, in this case:
(the two ways to have success above) / (2⁹ total possibilities)
equals 1 / 2⁸.

Next, what are the chances of flipping nine heads or tails in a row, in exactly ten flips? Again, the first flip can be heads or tails. The second flip, though, must be opposite to the first. The ten flips must look like one of these two:

H T T T T T T T T T
T H H H H H H H H H

Those are the only ways we can achieve nine in a row in exactly ten flips, but NOT in nine flips. (What? It’s because, for example, TTTTTTTTTH gives us nine in a row in nine flips, so this is not a successful solution to getting nine in a row in ten flips but NOT in nine flips.) Remember, the nine heads (or tails) must all come “in a row,” consecutively.

There are two successful ways to get nine in a row in exactly ten flips, shown above. The total number of permutations of heads and tails in ten flips is 2¹⁰. So the chances of getting nine in a row in ten flips is:
2 / 2¹⁰ = 1 / 2⁹.

1 / 2⁹ is only half as likely as 1 / 2⁸ .

Multiplying 132,843,101 total entries by 1 / 2⁹ equals a theoretical result of 259,459, very close to the experimental 259,305 ten-flip results that actually occurred in the big file. So it’s true: getting nine heads or tails in a row in the first nine flips is a much more likely occurrence than getting nine in a row in ten flips.

1 / 2⁹ is only half as likely as 1 / 2⁸ .

What are the chances of flipping nine heads or tails in a row, in exactly eleven flips, but not in nine or ten flips? Again, the first flip can be heads or tails. The second flip can also be either heads or tails. The third flip, however, must be opposite to the second flip. In order for nine in a row to happen in exactly eleven flips, and NOT happen in the first nine or ten of the eleven flips, the string of nine in a row must be the last nine of the eleven flips. The eleven flips must look like one of these possibilities:

H H T T T T T T T T T
T H T T T T T T T T T
T T H H H H H H H H H
H T H H H H H H H H H

There are four successful possibilities, shown above. The total number of permutations of heads and tails in eleven flips is 2¹¹. So the chances of getting nine in a row in eleven flips is
4 / 2¹¹ = 1 / 2⁹.

Look! 1 / 2⁹ . Indeed, as we saw in the chart above that expanded the first twelve data points, the chances of getting nine in a row in eleven flips is the same as the chances of getting nine in a row in ten flips.

We can now generalize. What are the chances of flipping nine heads or tails in a row in n flips? The last nine must all be heads or tails. The flip before the nine-in-a-row must be different from the nine-in-a-row. The first n–9 flips can be anything at all, so there are

possibilities for the first n–9 flips of the string of n flips. The total number of possible states for the entire n flips is

The chances are always 1 / 2⁹ when the number of flips is greater than 9!

But our strange chart of the big file with no averaging (seen again just below) doesn’t show the same number of data points for every number of flips along the abscissa. The height of the curve decreases for increasing number of flips:

The reason:

Let’s say you’re trying to compute theoretically how many times nine in a row will show up in exactly 100 flips out of the big total of 132,843,101 tries. We demonstrated above that, taken in isolation, the chances of getting nine heads or tails in a row in exactly 100 flips is 1 / 2⁹. But if you want to know how many times that will happen in 132,843,101 tries, you can’t just multiply 132,843,101 times 1 / 2⁹, because: as you’re flipping the coin, before you get to 100 flips, you might find nine in a row after only 23 flips, or after only 85 flips, or after 44 flips, so there will be some attrition from the pool of total 132,843,101 tries before you get to analyze the higher number of flips.

Thus: you don’t multiply [1 / 2⁹] by [132,843,101]. Rather, you multiply

[1 / 2⁹] by

[(132,843,101) minus (the number of times in 132,843,101 tries that nine in a row happens for fewer than 100 flips)].

So let’s compute the theoretical version of the experimental results that we’ve been examining in the chart above. We’ll say that T equals the total number of flips in the big file: T = 132,843,101.
Then, we’ll say that:

Previously, we said that the chances of finding nine in a row in exactly ten flips is 1 / 2⁹. So:

Now, let’s put equations (1), (2), and (3) above together as our theoretical results, and chart it. Below we see the resulting chart.

Our chart still contains the experimental results in a thick line of dark blue, the same line we saw on page seven. The chart now, however, also contains the theoretical results described in the three equations above: the theoretical curve is a thinner white line:

See the white line inside the blue line? Our theoretical results follow the experimental results exactly!

Now, before we quit, one more thing. So the above chart applies when we plot the raw data file. When we average many entries from the raw data file, we get a more standard-deviation-y chart, curves with peaks at the average of 511, curves getting tighter and tighter as we average more entries. But at some point, if we average fewer and fewer entries, shouldn’t the resulting chart approach the appearance of this chart above? Let’s explore that.

Previously, we looked at charts where 50, 500, 5000, and 50 thousand entries were averaged to get each data point. Below, we see the chart when ten entries are averaged for each data point:

Now we understand better what’s happening with this chart. The peak is spreading out — standard deviation is up to 159 — but also, the location of the peak is shifting to the left. The peak of this curve is not at 511, but rather at 467. The raw data file chart doesn’t really have a peak, but most of the results crowd the left side of the chart.

We saw a hint of this earlier: the peak of the average-500 chart was 509, pretty close to 511, but the peak of the average-50 chart was 501, noticeably less than the average of 511.

Think this finding of more-spread-out curves, with peaks shifting farther to the left, will continue? Sure it will!

Below we’ll see the charts when we average five, four, three, and two raw data entries for each data point. You can follow how the peaks spread out and shift to the left, with the standard deviations increasing and the peak points shifting to lower numbers: