The Statistics of Spirit Boxes

5 min readNov 4, 2017

My boyfriend, Alex, and I have been really into the popular Buzzfeed show “Unsolved” . It stars two young guys — Ryan (a believer), and Shane (a skeptic) — who travel to supposedly haunted locations in search of paranormal activity. It’s really well edited and extremely entertaining, even for non-believers like us (aka “shaniacs”).

Recently, they introduced a new toy into their investigations: the so-called spirit box. This is basically just a radio scanner. In other words, it scans through about five radio stations per second. People listen to this output and hope to hear a ghost communicate by manipulating the radio signal to create words. (Yeah it sounds ridiculous.)

Ryan and Shane actually do hear many phrases that sound like a human voice. These phrases are greater than 1/5th of a second, so, in theory, multiple radio stations need to conspire together to create these phrases. One of the most telling examples is when Ryan asks a “ghost” what color jacket he is wearing and the spirit box seems to say: “brown and white”. See the clip below:

This got us wondering...how likely is it for a spirit box to randomly output a phrase like “brown and white”?

This can be answered with ~~statistics~~

So we made a model of a spirit box using Python (a programming language) and a really awesome phonetic dictionary put together by CMU. Basically, this dictionary took about 130,000 words and lists their pronunciation as a string of “phonemes” (or distinct sounds which are shorter than syllables). For example, “brown and white” would be broken into “B R AW N . AH N D . W AY T .” So this phrase is actually 10 distinct phonemes.

With the phonetic dictionary in hand, we could model the spirit box using some code. We figured out that a phoneme takes roughly a twelfth of a second to pronounce, so if Ryan and Shane are scanning the radio at 1/5th of a second, they will hear about two phonemes from any one radio station. We’re assuming here that every frequency the spirit box scans through is actually a radio station and that every radio station has a talking head. This is obviously not true, but these assumptions will just make it more likely that we will hear a phrase from our simulated spirit box. We cut all of the words up in two phoneme chunks and randomly draw from these chunks (we’re going to call these “diphonemes” to be fancy).

We strung all of the diphonemes together to replicate a spirit box. We actually paired this with a computer voice found natively in Mac OS that sounds kind of spooky and static-y. You can hear an example of our spirit box below.

This doesn’t sound exactly like Ryan and Shane’s spirit box. Why is that? Again, most radio frequencies don’t have radio stations, so they will sound like static noise. Additionally, many radio stations will be playing music, so you won’t hear a human voice continuously. But we were pretty happy with this simulation. Here is what “brown and white” sounds like:

Back to statistics. We searched this simulated spirit box output for words from our CMU dictionary, and tracked how many words we found in an approximately 30 second spirit box sample. (This is about the maximum time it takes the “ghost” to respond to Ryan/Shane.)

We found that our code detected a ton of words which we don’t typically use (like “delfs”). So we cross referenced this list with a list of the 1000 most common words in the English language (Note this list does include “brown and white”). After cross referencing this list of common words, we found that on average only 3 words were found in 30 seconds! Here is an example of words found in a 30 second clip:

  • soon
  • mass
  • ask
  • fly

The average phoneme length of the words found is 3 (we didn’t let the simulator find “words” smaller than this, like “a” for example). So, on average they are pretty short words, and you are likely to hear at least 1 word in 30 seconds. But what about hearing multiple words together, e.g. like “brown and white”?

For this, we can assume that most words discovered will be 3 phonemes long, and 12 phonemes fit into 1 second. This means that we can hear up to 120 words in 30 seconds. We only hear 3 words on average, so the probability of hearing a word in any of the 120 slots available is 3/120. The probability of hearing two words together (like “goat man”, heard in the most recent episode) is:

There is a 0.38% chance of hearing two words in a row. From a statistical point of view, this isn’t quite a “significant” anomaly.

What about three words (e.g., “brown and white”) in a row? The equation is now:

There is a 0.01% chance of hearing three words in a row. This is statistically significant, and is not likely to occur randomly!

Apparently if you have a spirit box, you would need to listen to it continuously for 200ish days to randomly hear three common words in a row! Also, many of the assumptions we’ve made makes hearing this phrase more likely in our simulation than in real life, so you may need to listen much longer in reality. That being said, the human mind can often hear what it wants to hear.

If you’re interested in the (honestly pretty badly documented) code, it’s available on my github.

So are there ghosts? What do you think?

--

--

Ashley Villar & Alex McCarthy
Ashley Villar & Alex McCarthy

Written by Ashley Villar & Alex McCarthy

We apparently only write about ghosts and aliens…statistically.

Responses (6)