NUFORC Data Analysis

A Preliminary Analysis using Pandas and Plotly.

Noah Hradek
7 min readSep 10, 2022
Adam B Kashlak

Introduction

I conducted a quick analysis of NUFORC data courtesy to Peter Davenport because I was curious about recent sightings and the nature of the phenomenon I was experiencing. My analysis is hosted on GitHub and can be viewed by anyone. The dataset is large but better data with longer timeframes and global scope would be helpful as well because my data was confined to the United States and a small number of decades.

There are other reports on NUFORC data, including this one on the number of reports and this textual analysis of report text. However, my focus I felt should be on geography, date, and time. If there was a pattern in terms of when and where sightings occurred, I was interested in finding it. Therefore, I ignored the textual data and only used the date and geographical data.

I aggregated the dataset across all states but only focused on data from the United States, although the dataset includes international reports as well. I didn’t use any other outside data sources, which might have biased the data however also provided more information. Any further datasets would be appreciated, although I found the NUFORC dataset to be the most complete. In terms of the toolset, I decided to use Pandas and Plotly, which allowed me to quickly determine statistics about the dataset and plot histograms and maps. Finally, I hope you find this analysis insightful and interesting.

Shape

Light        27909
Circle 14610
Triangle 12387
Fireball 9580
Other 9461
Unknown 9386
Sphere 8985
Disk 8188
Oval 5981
Formation 4590
Changing 3447
Cigar 3419
Rectangle 2382
Flash 2375
Cylinder 2186
Diamond 1985
Chevron 1588
Egg 1192
Teardrop 1182
Cone 553

The three most common shapes were: Light, Triangle, and Circle. Light is obvious since any bright object from a distance will appear as a light, including stars and other astronomical objects. The well-known disc shape, attributed first to Kenneth Arnold, was only 6% of sightings. Triangular and Circular shaped objects were more common. However, disk-shaped objects are unlikely to be mistaken for planes or other winged aircraft.

City

I retrieved the ten most commonly seen metropolitan areas and listed them here. Population is correlated with sightings, despite that the cities with the most sightings weren’t the most populated. Rather they are all western cities in the western half of the United States. This is an area pockmarked with secretive military bases and UFO mythology. Could this account for the increased number of sightings as well?

When I determined the correlation with population data. I got a Pearson correlation of 0.334, which indicates a moderate correlation with the population. Urban locations will have more sightings by being more populated and near airports where more planes can be seen. These don’t account for all sightings, but some of them can be accounted for this way.

State

The top state is California, followed by Florida, Washington, and Texas. Most of the top states here are the most populated; many are also western states. California has nearly double the number of sightings of Florida which makes sense since California is highly populated. Observing patterns in less populated states might give us more insight also.

Country

The United States has the vast majority of sightings. However, rather than a lack of sightings, I attribute this to NUFORC being an American-based English-speaking organization that isn’t well-known outside of the United States. The anglophone countries tend to have the most reports, supporting my conclusion. There are claims that UFOs are an American phenomenon however there are many UFO reports that exist internationally as well.

Duration

5 minutes      8449
2 minutes 6153
10 minutes 5952
1 minute 5278
3 minutes 4522
30 seconds 3745
15 minutes 3591

The duration data above indicates that the most common duration range is from 1–10 minutes. This is probably because people either look away or the object disappears after a few minutes. Any sighting that takes too long would disinterest an observer. Comparing this to the duration of UFO videos would be interesting and give us more insight.

Date

The data for sightings spans from 2022 to 1961 however, when the sighting was posted only goes back to 1998. We see a large discrepancy between when the sighting occurs and is posted. This could be for various reasons, but the posting date doesn’t give accurate information about the phenomenon.

Some spikes correspond to holidays like the 4th of July when fireworks are displayed. However, that doesn’t account for all the increases. The sightings have periodic behavior where there will be an increase and then a dip during certain years and months. This doesn’t always correspond with months with many sightings; for example, April has a peak in 2020, which doesn’t correspond with a holiday where fireworks would be used. This partly supports Vallee’s assertion of a reinforcement system, discussed in his books like Messengers of Deception.

We can view reports by the year the sighting occurred. There is a peak around 2014 and then a decline followed by an increase around 2020. The internet seems to have sparked a large increase in reports, and there are many more reports after the turn of the Millenium.

We can also group by month and get the number of sightings each month. Doing this, we see that the spring months have the least sightings and the summer months the most. July had the highest number of sightings, as well 4th of July. However, I suspect the high number of sightings in summer can be accounted for by the number of outdoor activities that occur during this month. More people outside means more people looking at the sky. Winter seems to have a similar number of sightings to spring which is strange considering spring is a more amenable season.

Plotting by hour on a 24-hour cycle gives us a sense of when sightings most commonly occur. Often this is during the night, around 10 pm which is when most lighted objects are visible. Day visibility is less common, and common objects like planes can be seen easier as well during the day. Fewer conventional aircraft and balloons are flying at night also. Stars and satellites might be misidentified and so determining whether any of these are some of the reports might be useful.

Geographic Density

Sightings occurred throughout the lower 48 states, with the largest concentrations in the western half. However, the eastern half had more overall reports, likely due to its larger population and higher population density. The western half is more sparsely populated with large stretches of desert with nobody living there. If we compare it to a population density map, we see a pattern that correlates with the sighting map.

Wikimedia commons

Conclusion

I am biased because I think some UFOs are interdimensional, extraterrestrial, and some are secretive military projects. That being said, the data isn’t conclusive for any one hypothesis or even that any of the reports are unexplained. It indicates a periodic pattern to sightings, certain seasons where they are seen more often, and geographic concentrations but not much more. Periodic phenomena could be natural in origin, however with quite a few reports describing shaped objects and a higher concentration in western American cities, I’m not sure that applies. Even if out of 138,219 reports, we assume the Blue Book 5% unidentified classification rate, we still have 6,910 sightings left unidentified.

Video analysis of channels with reasonably well-recorded UFO footage like Steve Barone’s would be fascinating since he’s recorded so many videos. Data analysis of the contactee literature would be interesting, although a strenuous project, I would be interested in doing that in the future. It would likely be useful to use an n-gram model and then some clustering algorithm to find similar reports in such a case. Overall the dataset is fascinating but not very useful outside of basic analysis. A more useful dataset with further historical data with reports stretching back at least a few centuries might be more insightful.

--

--