Missing Persons and UFO Sightings

A modern digital journey with Wild Big Data


The connection between the UFOs and Missing Persons has been a hot topic of conversation recently. Both data sets are nearly a century old, rich in detail and are statistically valid regardless of the error margin. This leaves exciting possibilities for statistical analysis and correlation with other data sets of interest like ancient sites, birth records and geo magnetic lines.

This digital exploration starts with cleaning the data into a useable form. This very challenging and time consuming task was motivated by wild claims. Both data sets are wild. Meaning they contain a century worth of data in raw or wild form as entered directly from the public. What kind of story could mysterious disappearances and UFO sightings contain? Mass abductions? What was the real story on the data access ? I have heard some wild claims yet would there be proof?

So to start a simple test was done taking a year of 2013 of both Missing Persons and UFO sighting records. There was one correlation and one person found. Exploring the single result closer the result revealed reasonable and probable abduction from a purely data point of view. So the next question of what if the whole century of data was used? Would the results be different ? How hard could that be ?

For perspective, the UFO data set compared to the population of our planet is important. With close to 7 billion people across the globe the total data set size is small by comparison. The UFO data set can be compared to the size of a small city of 80,000 people.

At first glance the data needed geocoding enhancement of the locations added and the sighting duration times standardized to seconds. By doing this it would allow the data to be mapped for location cross referencing and allow a deeper exploration of the sighting duration times. The many types of international words and time representations for the sighting duration made the data conversion complex. Also the some of the UFO locations were non standard and contained mile markers and other random detail. The difficulty of the conversion task was amplified again by the large number of diverse data records. The lesson after several attempts was a gentle consistent staging of the results with small incremental successes until the error rate was small enough to be useable. Like chiseling a sculpture from stone one little pieces at a time with no mistakes.

The Missing Person data weighed in at 10,000 records recovered out of supposedly 17,000 records available. The whole 17000 was not available as claimed which may need some more research. All data sets were acquired with conventional means and no server hacking or other techniques were used. If I could not get the data legitimately then I was going to use what I could get and describe the results. The task of converting the Missing Person locations to geocodes seemed trivial after the UFO sighting record exercise. There was a cross over effect where techniques used on the large data sets were directly transferable to the smaller data set.

Considerable time and computing power were used to get both wild data sets into a state to work with one another and this was just the first step. In the next step the data had to be loaded into a database and then queried and matched. The results had to be analyzed for completeness and had to be ready to be publicly reviewed. Like any tough puzzle surely there had to be a picture at the end or maybe some geometry in the wheat. At this point it was getting exciting and all the variables were beginning to reveal themselves.

The raw unfiltered result set contained 56 records out of 80,000 UFO sighting records merged with 10,000 Missing Person records. This was confusing at first. The result set revealed 56 hits. This means 56 hits of one or more UFO sightings with one or more missing persons. One name could be correlated with one or more sightings and a sighting with one or more names. In the end there were 23 valid people to explore in detail.

The fact that this was just a data set and not a true real world context stood clear after looking deeper into each found record individually. To answer the original question of a valid correlation between these two data sets more detail was needed. Digging deeper into each result record researching both UFO sighting detail and Missing Person all supporting data was carefully studied. The results stunned me personally. Each missing person detail left a shocking absence of a living conceived and typically loved person or child. Although portrayed in a typical dry factual tone, the detail on the missing person records was interesting, factual and sometimes very disturbing. The detail on each UFO sighting started with separate lookups through MUFON and NUFORC looking for more detail. During this exercise a key contextual detail was discovered related to time. A typically day is defined midnight to midnight in data whereas a relevant real world context was typically a 24 hour window before and after the Missing Person last seen date. For example, a missing person may be missing before midnight and have a potential incident or UFO sighting after midnight the next day. This seemingly simple detail was discovered after drilling down through the results investigating each result on the standard day range.

The result set is a small percentage of the overall records, less than one tenth of a percent of the total sample size (0.06%) leading some people to dismiss the overall possibility of abduction by a UFO as less than probable. However, the research on the result set left a general feeling of wonder with a lack of detail most unsettling.

During the investigation of the results one person previously listed missing was declared found after approximately two years (Abrihet P Wallace). What this says to a UFO researcher is beyond words and opens a special case with abductions. This special case shows how if someone is found, more steps are necessary to find if they are living or dead. Also there may be law enforcement involvement to reveal the story of the findings.

Heather Elvis, 20, Myrtle Beach, SC 12–18–2013, investigation revealed a potential abduction, using a 24hr margin from time of last seen revealed a rash of sightings on the MUFON site during and after her last seen time. The NUFORC database reported a bright light (300 sec) at 9:00 pm the same day she went missing. Her disappearance affected a whole community ending with a couple charged with her murder. Her body was never found. A TV special dedicated to her case was found with a quick search of her name and also a $25,000.00 reward for information. There was a large degree of community involvement leaving me with the feeling that this young woman was loved and cherished by her community. There was a high degree of interest in her sudden disappearance.

Stacy A Peterson, 23, Bolingbrook, IL 10–28–2007 investigation shocked me with a lack of data and therefore evidence of a possible abduction. There was absolutely no data in the Missing Person detail and it appears she just vanished (!) NUFORC reported a light (900 sec) and MUFON reported a mysterious black triangle showing up the next day in the same location (24hr margin from time last seen). There was no family follow up and only one friend who reported the detail in the Missing Person record. The friend who felt powerless was seemingly traumatized by the sudden disappearance of her good friend. No suspicious circumstances just vanished! This case was published on National News and is still an open Missing Person case.

The mother and daughter combination from Yuma, AZ, Claudia B. and Claudia J. Guillen, 21,2, 11–24- 2004 correlated with a NUFORC report of a Chevron (360 sec) at the same time and place as a reported encounter. The missing person report revealed both were last seen at a nearby store. This was a surprise find and one of the most unique discoveries on this data journey as I personally heard this on the news many years ago. This mother and daughter couple just vanished like Stacy Peterson with no suspicious circumstances.

An interesting note to keep in mind is that the current philosophy on UFO abductions is that they average between 2 and 5 hours and the people are always returned (D. Jacobs). Also recent well known evidence indicates that many UFO abductees may not recall the event for many years if ever. On a more curious note there were more matches with UFO sightings in 2014 while the Missing Person data was still fresh than with the older data. This signifies that there may be value to UFO researchers to check the missing person and UFO data feeds for correlations. As in the Travis Walton UFO abduction case he was found after five days and has contributed greatly to our collective understanding of abductions.

To conclude, the effort to scrub and match the data with respect to UFO databases was very difficult and took an absurd amount of hours pouring over details trying to get the error margins reduced on locations and duration times. The Missing Persons database was good for individual records and fell short with bulk loading limits. The Missing Persons people have to be commended as my experience with that data left me changed. It showed me that each of us is precious from our conception on through our lives journey and there really is a great need for greater community and awareness of our individual struggles talents and values.

The intersection data set can be considered insignificant by some because there is no real hard conclusive evidence. To the non skeptic and someone who may have had an encounter or UFO experience the consideration may be different.

The lack of a usable working relationship with the people responsible for the UFO data saddened me and left a impression of information greed and hoarding. We all would have benefited from working together. The bow finger is the symbol that comes to mind; big and bold. A cold shoulder from Peter Davenport at NUFORC. And a dead end response from MUFON left a less than favorable impression. There is hope that the future will be different and it seems that a little resistance is required for change and is considered validation of doing something meaningful. On another note the sloppiness in the way the data was stored and collected by NUFORC was disappointing. Also MUFON would not give a large data dump of the data even after many requests. This could improve as this is public data and people have a choice what to share and where.

This was a fun exercise. In my opinion some interesting results. The unexpected tour of horrors of the unknown and the missing shocked me and left me with a greater need for more community support for everyone. The data sets are on www.github.com/planetsig and is complete with database dump, spreadsheets and more detail on the actual conversion process. A spreadsheet of the result set is there too. This whole project left me with a feeling that there is no excuse not to work together, especially if working together could be as simple as setting parameters following commitments and notifying each other of issues. By cutting off the feedback of what I have learned here on how to improve data collection and distribution it will only take longer to improve our awareness through data.