What the First Statistical Analysis of the Missing 411 Data Reveals About the Phenomenon

Martin Rezny
Words of Tomorrow
Published in
12 min readMar 15


Or how we may actually start having some answers


Some time ago, I have reached out to the primary investigator of strange disappearances, David Paulides, who put together the profile and list of cases that qualify as conventionally unexplainable. I asked him whether he put the Missing 411 data into some kind of spreadsheet, so that it can be properly scientifically analyzed. Unfortunately, I didn’t get a response.

It is unfortunate because statistical analysis is the only method that we have through which we can begin to determine which of the profile points look only like coincidental correlations, and which profile points look like genuine anomalies pointing to something truly odd happening. I resigned myself to having to do the work of assembling the sample myself, assuming I would find time for it, but then something fun happened.

As my most read article on Medium so far with over 60,000 views is about this very topic, I get someone writing to me about it every once in a while. Usually, it’s people sharing their personal accounts or theories with me. Which is nice, and I hope people will keep doing it, but more accounts and theories can only get us so far. Then I was contacted by Gary Koehler.

As it turns out, Gary already did the work. He assembled a sample of 1,127 victims from the first 7 Paulides’s Missing 411 books. He also did a preliminary descriptive analysis of the data, looking for basic patterns. He shared both with me and gave me permission to use the data and his analysis as I see fit. I decided to start by sharing my first impressions here.

Apparently, Dave was wary of catching a computer virus, so he didn’t want to receive the attachments. Which doesn’t make a lot of sense to me, but I guess Dave has many more reasons than I do to be suspicious of people who reach out to him online. In any case, I believe this kind of analysis should have been attempted long time ago, and the results are interesting.

I won’t post the whole thing here, either the data set or Gary’s analysis in full, only the highlights with some basic peer review comments to get the discussion started. But if you’re interested in getting your hands on them, let me know through a direct message on any of my social media and I’ll share them with you. Well, unless you’re afraid of computer viruses. I will eventually post them on my website, once I start doing my own analysis.

The Information That’s No Longer Missing

Considering that the sample represents about a half or even a majority of all conceivable Missing 411 cases, or, in other words, a substantial portion of the whole statistical population, it’s highly likely it is strongly representative. It’s also so large that even most of its demographic subgroups are large enough samples by themselves by the usual standards of statistical research.

There is some potential for concern regarding the mixing of different types of cases, for example between the rural and urban settings, or between distant time periods (from the 19th century until the present), which could make the sample theoretically somewhat incongruent. In the future, creating separate samples for at least the two different types of location and for different generations of people could be worthwhile.

For this sample, just keep in mind that urban cases from the Missing 411: A Sobering Coincidence book account for only 117 cases in total, and that most Missing 411 victims are hikers, which only apparently started disappearing after WW2, or at least that’s when their disappearances started being recorded.

Then again, there’s every expectation that Missing 411 cases are a mix of multiple scenarios that all share the attribute of not being conventionally explainable. Let’s therefore consider this to be an exploratory analysis of what all unexplained disappearances have in common.

With all the caveats out of the way, let’s look at the data. Of the 1,127 missing people in total, there are:

  • 853 males (75.7%) and 274 females (24.3%).
  • 479 (42.5%) are 0–17 years old, 625 (55.5%) are 18+, and 23 (2%) are of unknown age.
  • Of the confirmed adult victims, 199 (17.7%) are 18–24, 113 (10%) are 25–34, 73 (6.5%) are 35–44, 69 (6.1%) are 45–54, 62 (5.5%) are 55–64, and 106 (9.4%) are 64+.
  • 269 (23.8%) were found alive, 355 (31.5%) were found dead, and 501 (44.5%) were never found.

On the most basic level, we can therefore conclude that men disappear mysteriously about 25% above expectation, or 50% more often relatively speaking, as men and women are roughly equal in number; and that based on American census data, children and adolescents go missing about 20% above expectation, or twice as often relatively speaking.

These discrepancies look significant and confirm these profile points, but are they actually significant? It depends. A basic demographic comparison may not be valid, if the demographic composition of national park visitors differs substantially from the demographics of the general population.

After a cursory search on the internet, it appears that the available demographic research in this area sadly isn’t great. For example, I wasn’t able to find any percentage of children and adolescents visiting the parks anywhere, only that baby boomers should be overrepresented today.

It would make sense for children to be overrepresented too, but I suspect it shouldn’t be twice as many children compared to the normal ratio in the general population. One book included in the sample is about hunters alone and one is almost exclusively about college students, so that should artificially lower the ratio of children in the sample, if anything.

But regarding the issue of male-to-female national park visitor ratio, there is some data. According to a study in the Journal of Leisure Research, the ratio between male and female visitors of national parks should be close to equal, much like it is in the general population, with women being only slightly underrepresented. I found no suggestions to the contrary.

In terms of prosaic explanations, the reasons may be psychological — maybe more men go missing because they’re that much more reckless. Hunters specifically should be about 75% male, but the sample isn’t mostly including hunters, so that doesn’t explain a discrepancy this large. Given the substantial sample size, statistical error isn’t going to be 25% either.

Gary also did some demographic comparisons of his own, and determined that national park visitors are above average in terms of education and income and that they’re overwhelmingly white. These profile points (high intelligence, caucasian complexion, and perhaps German heritage) are therefore likely coincidental and not targeted or causative.

Overall, what this absence of good comparison data means is that what should be done next, ideally by skeptical researchers, is to assemble a comparably large control sample of people who disappeared in North America in the last century or so under explained circumstances. Without that, it isn’t possible to definitively say what role gender or age normally plays in determining which groups should go missing how often. Still, these discrepancies do look substantial and not like a fluke or error.

Venturing into the Unexplained

There are more details of the disappearances or of the people who went missing listed for each case in the data set, but judgment call-dependent features can get somewhat subjective, so let’s stick to the demographic basics for this initial review. These are the most striking comparisons that can be drawn between various demographic sections of this sample:

  • Vast majority of those found alive were minors, 247 out of 268.
  • Young men between 18–24 years of age were found dead much more often than the rest of the adults (64% versus 13–40% for all other adult gender+age groups).
  • The elderly have the highest rate of not being found (81%).
  • Women, both minors and adults, are found more often alive than their male cohorts by roughly 10–15%.

Now, these statistics are quite interesting. Let’s address minors first. I’m really not sure why minors should be found alive so much more often. I guess they’re not likely to wander off as far away as adults, but there was a number of cases where children were found many kilometers away. The same should apply to the elderly, but those are the least found group.

Similarly to the elderly, children should have the lowest chances of surviving in the wilderness (due to low stamina or survival knowledge or skills), and the highest likelihood of behaving irrationally (like the elderly with dementia), including going in an illogical direction, hiding from searchers, or actively endangering themselves. It just doesn’t make sense.

Comparatively, young men should be exceptionally able to survive, and yet, they almost never do. The many young men found dead anomaly is probably due to almost all urban cases being like that. But even just within the urban cases, why would it apply only to men aged 18–24, and not even to men aged 25–34? Men go drink with buddies to bars in cities with rivers all the time. Even more strangely, the oldest partier case is from 1988.

I guess there could be some prosaic explanation, but it’s not at all obvious. Maybe the differences between the minors and adults are because when adults disappear and survive, we are bound to almost always know what happened and the case will be explained. Then again, there are explained cases of adult disappearances where the adult was found dead or was never found. This is where the non-demographic profile points come in.

Gary does cover them also in his analysis, to the extent to which they can be objectively determined. Some profile points are presently too open to interpretation, like what constitutes high elevation or a boulder field or how near does the water have to be to where a disappearance occurred. Conclusions of autopsies are also highly questionable on the grounds of suspected general incompetence of U.S. coroners. The statistics for profile points that are fairly objective and not open to interpretation are as follows:

  • Dogs failed to find scent and track the victim in 81% of cases.
  • 86 victims (7.6%) were found alive and 198 (17.6%) dead in or near water.
  • 61 victims (5.4%) were found alive and 124 (11%) dead in a place previously searched (59 of those found alive were minors).
  • 417 victims (37%) disappeared in the afternoon between noon and 8:00 p.m. and 205 (18.2%) in the remaining 16 hours of the day (the rest of the reports presumably don’t include precise time of disappearance).
  • Bad weather adversely affected 381 search and rescue missions (33.8%), but the search outcomes were almost identical with cases where bad weather wasn’t involved.
  • 220 cases (19.5%) involve reports of missing clothing, 176 case descriptions (15.6%) mention shoes or socks specifically.

Let’s try my best at offering as prosaic explanations as possible. Dogs not being able to track in these cases may be an indication that this is simply a big contributing factor to us not being able to explain a disappearance. Although, this argument would be more convincing if it correlated with the victims not being found, or only being found dead, but this isn’t the case. Then again, if most victims found alive in these cases are minors, that may explain why finding them still wouldn’t offer enough explanation.

The water issue is somewhat fuzzy, particularly in determining how close to a disappearance the body of water or river needs to be to qualify as near, but the victims being found in or near water offers a lot of hard evidence that’s perplexing both logistically and medically. Still, this more solid part of the water connection only accounts for a fraction of all cases.

As for the striking difference between children being found alive in the place previously searched much more often than adults, that is odd. It could only be prosaically explained away if the reason why almost no adults are found alive in Missing 411 cases is because adults who are found alive almost always explain the case, as far as I can imagine.

A comparison with a control sample of normal disappearances is necessary to help determine what’s possibly going on here. If adults in explained cases are found alive versus dead at the same rate as minors in Missing 411 cases (roughly 50:50), and if minors are found alive versus dead at the same rate between Missing 411 cases and explained cases, that would perhaps suggest such sample selection bias exists.

If, on the other hand, the missing adults in explained cases still usually aren’t found alive, or if children in explained cases die much more often, the data would still support something odd going on, in my opinion. Of course, I’m not an expert statistician, so I may be wrong in my assessment, but a control sample comparison should definitely help clarify this.

The most common time of disappearance during the afternoon may just be coincidental, assuming that’s simply the time when most people are out and about in the wilderness. Only a comparison with a control sample of normal disappearances could tell us if the proportion is not just high, but unreasonably high. Gary generally argues that most people who go missing simply go missing where most people and national parks happen to be.

It’s interesting to me that bad weather basically has no effect on the success of searches, and that it occurs in only about a third of all cases. This strongly indicates to me that the storms are coincidental and not causative, merely sometimes happening and sometimes not. Only a comparison with a control sample of normal disappearances could prove otherwise, if bad weather correlated in it much more or less often with missing cases.

Finally, as for the missing articles of clothing, particularly shoes, the only prosaic explanation that I’m aware of is paradoxical undressing due to hypothermia. But based on my specific knowledge of many Missing 411 cases, this doesn’t cover anywhere near all of the cases due to the temperatures often not getting low enough even during the night.

Perhaps another mundane, but less documented mental condition plays a role here. Getting rid of shoes in rough terrain is quite irrational, so whatever is going on, some kind of dementia, confusion, or mental episode would have to be a minor, but significant contributing factor to why people go missing in a way that will end up being unexplained. This certainly leaves room for the possibility of drugging, poisoning, or kidnapping.

Conclusions, Conjectures, and Cautioning

What Gary seems to think about all this is that this data support the hypothesis that there are intelligent perpetrators behind at least a portion of these disappearances. Personally, I’m not sure about that, but I would concede that some of the discrepancies certainly don’t rule that out. Without a control sample comparison, however, we can’t be sure.

It’s also possible there are some mistakes or errors in the data set. I haven’t gone over it yet with a fine-tooth comb, but there don’t appear to be any glaring omissions or major inaccuracies, perhaps at most a slight miscounting between 1–3 victims in total. Not to a point that it would explain 20–25% deviations from average. The differences betwen men and women and between minors and adults need to be accounted for.

The best prosaic hypotheses that I can come up with are that there must be a big difference in male and female psychology that plays a part in getting lost and/or making the disappearance unexplained, and that minors who are found alive are vastly overrepresented here because adults who are found alive almost always explain the disappearance. Why so many minors are involved to begin with could be due to youth excursions being popular.

I wouldn’t call these null hypotheses, however, as they aren’t extremely obvious or simple. I don’t think any hypothesis in particular deserves to be treated as the default null hypothesis for the Missing 411 cases. I think we need to approach them with an open (scientific) mind, without any assumptions. Using statistics, we may then be able to figure this out.

Ultimately though, we should also be cautious about not ignoring other forms of investigation. While statistics may eventually prove that there is no huge inexplicable thing going on here overall, that doesn’t invalidate or explain any individual case. Using the detective approach like David Paulides does is necessary to determine whether there are any truly anomalous cases that need to be singled out for a detailed inquiry.

Assuming for a moment that the perpetrator hypothesis is correct, Gary likes to use a fishing analogy. If you think of the missing people as fish, then there are a number of different scenarios, differently motivated, in which a fish would disappear in a way that other fish wouldn’t understand.

A fish could be eaten by another fish, but it can also be picked out of its environment by a being that exists outside of water and then either returned back into water or not. What the data would suggest in regard to this type of scenario is that the “beings” go fishing where the fish are, don’t specifically target them, and probably have a range of different motives (hunting for food or mates, getting rid of competing hunters, study, etc.).

Of course, this is just one of the competing hypotheses, one that skeptics don’t like because assuming the existence of as of yet undiscovered intelligent non-human beings in our wilderness is simply an Occam’s bridge too far. What I’d argue is that this data doesn’t rule it out, so it can’t be justifiably thrown out until a control sample comparison is done.

Until then, it’s back to speculations, or possibly investigation in the field, for all of us. Do let me know what you think about this analysis and definitely try to suggest alternative explanations that are compatible with this data. The more people get involved in the peer review, the better. Like I said before, if you want to see the full data set and report, message me.