More Than Meets The Eye: Highly Reported UFO Events — Part II

8 min readMar 13, 2018

Using natural language processing on a UFO report dataset to investigate the most interesting highly reported events in the United States.

This is Part II of a two-part series analysis. In Part I, I walked through the text vectorization and clustering algorithm I used to reveal latent topics in the UFO reports from the National UFO Reporting Center (NUFORC). This helped me identify five dates to investigate further.

Part II takes a deeper look at the events on those dates through word clouds, news reports and sentiment analysis of report text. I found that four of the five dates pointed to specific, localized events that were attributable to natural or manmade phenomena.

Taking a Closer Look

In Part I, I identified the 10 most highly reported events in reports collected by the National UFO Reporting Center and ran text clustering on the data to understand the content of the reports. That analysis focused my attention on five specific dates, three that had significantly higher popularity scores in my most reported events analysis and two that were flagged through clustering. See the charts below.

The Five Dates

13th March 1997
4th July 1997
16th November 1999
4th July 2014
7th November 2015

I wanted to understand whether the reports pointed to specific, localized events that might be explainable by natural or manmade phenomena, as well as whether reports on certain dates were more similar (i.e. more coherent) to reports on other dates. To do that, I generated word-clouds from the text of the reports using one and two-word pairings (see Part I for an explanation of text vectorization and TF-IDF). The more frequent a term appears, the larger the lettering; the larger the lettering, the more coherent the reports are to each other. (Note that only a selection of words from the analysis appear in the word clouds).

Phoenix Lights — March 13th, 1997

Of the reports submitted for the date March 13, 1997, 84% originated from Arizona. A lengthy Wikipedia page entitled Phoenix Lights, discusses patterns of lights moving in apparent formation, in two separate events on the same evening over Phoenix and northern Arizona. The page attributes the first event to aircraft flying in formation at high altitude and the second to a military exercise in which the National Guard dropped slow-falling flares from four planes over nearby Luke Air Force Base. Having lived for several years in Prescott, Arizona, I met several people who witnessed the events first-hand. So, while I find the explanations compelling, I understand many locals are skeptical, feeling that these explanations do not match what they saw.

Midwest Fireball — November 16th, 1999

In searching this date, I found an astronomical event, an unusually bright fireball, with a low altitude trajectory, passed over the midwestern states that evening. The fireball preceded the Leonid meteor shower, which occurs annually, and was reported across many states, with 76% originating from Iowa, Illinois, Indiana, Kentucky, Missouri, Ohio, Pennsylvania and Wisconsin. Article: Huge Fireball Dazzles Midwest!

Strange Sightings in Los Angeles — November 7th, 2015

NUFORC had manually annotated 219 reports with the phrases ‘US Navy’ and ‘missile launch’, after hearing about the event. This sighting was indeed a US Navy test missile, launched off the Californian coast near Los Angeles. An overwhelming 80% of reports received by NUFORC for this date originated from California and Arizona. Further Reading: news report.

This reminds me of the more recent SpaceX rocket launch back in December 2017. I started this project before the SpaceX event occurred, however it would have been interesting to compare these two dates.

Independence Day — 1997 and 2014

It’s easy to assume that any reports received on July 4th relate to fireworks displays, and that’s probably a good assumption in most cases; however, comparing the word clouds of the two 4th of July years, I was surprised at how distinctly different they were.

July 4th, 1997

Although terms such as ‘fireworks’ and ‘4th’ and ‘July’ occur in reports on both sets of dates, it’s interesting that the 1997 event had ‘St Louis’ and ‘St Charles’ mentioned with references to blue and green light. Of the reports submitted for this date, 74% were from those states, Missouri and Illinois. While I found a local news article mentioning the event, the article offered no concrete explanations. However, comparing the descriptions to the midwestern fireball of 1999, it seems reasonable to assume that the events reflected a meteor or other similar phenomena.

July 4th, 2014

In contrast, reports for the 4th of July, 2014 event reference dominant colors of red and orange and sightings of multiple objects (e.g ‘formation’, ‘objects appeared’, ‘one another’, ‘three lights’). These reports were made from 38 states and had no obvious geographic pattern.

Map of July 4th, 2014 Report Locations

**Map Key:** Red (smallest): single sighting, Peach: 2–5 sightings, Yellow: 6–8 sightings, Bright Yellow (largest): 9+ sightings in city. Check out the code for this map on my Github.

While I did not find any news reports of the sightings on Independence Day 2014, search results return this interesting YouTube video that appears to show lights moving in formation.

Further digging brought my attention to the rising popularity of personal drones around this time.

This leaves the obvious question of whether soaring popularity in hobbyist drones and unfamiliarity with those drones among the general public led to reports of drone sightings. As for multiple objects flying in formation, researchers submitted their work on autonomous coordinated flocks of drones earlier that year. Perhaps some enthusiasts were experimenting with coordinated flight that night?

Photo by Intel: Drone light display of the U.S. flag during the Super Bowl 2017 — Houston, Texas

Measuring Descriptive Coherence

In addition to taking a closer look at the contents of reports, I also wanted to get a better understanding of how similar reports were to each other. This gets a little statistical, so bear with me. In addition to using the word clouds to figure out whether the reports were localized, I also used the most popular terms to objectively measure the coherence of language for each date. I normalized each output and measured the statistical variance of the top 50 bi-grams (50 most popular word pairs) to identify which of the events had the most coherent description. The larger the variance, the more coherence. The maximum value being 1, meaning every report contains the exact same report text. In order of most coherent they are:

Independence Day 2014: 0.045
Strange Sightings in LA: 0.030
Midwestern Fireball: 0.024
Independence Day 1997: 0.024
Phoenix Lights: 0.020

While this is not a large enough sample size to infer patterns, it’s interesting to see that the most coherent reports are from the 4th of July 2014, which had the least amount of news coverage and, other than my own theories, a lack of consensus of what these objects were.

Sentiment Analysis on the US Dataset

Lastly, after generating the word clouds and comparing coherence, I wanted to better understand what type of language people were using in making these reports, so I did a sentiment analysis on the text of all US reports from years 1947–2017, to try to understand how emotive and subjective the language was. For this task, I used a lightweight, simple natural language processing (NLP) library, TextBlob.

The majority of sentiment was non-extreme or near neutral for both emotion and subjectivity. Of the total (93,000+), only 31 had highly positive emotion, only 4 had extremely negative emotion and 20% had scores over 0.5 in subjectivity. This indicates the corpus of reports is quite objective in their descriptions.

Conclusions

A number of conclusions leap from this project. On the technical end, k-means clustering is a great tool for exploring latent sub-groups of data. It highlighted events, such as the November 2015 US Navy missile launch and Independence Day 2014 event, that I wouldn’t have considered investigating otherwise. The analyses helped me understand the nature of the reports and how relatively coherent they were.

On the more topical side, the biggest thing that caught my attention was that all but one event was localized to a geographic region and that event was also the only one that I couldn’t find a clearly convincing explanation for. However, I am tempted by the idea that rising drone popularity, unawareness of drones and people wanting to video fireworks from a higher vantage point led to a surge in reports that year. It sure makes for some interesting footage on YouTube!

Additionally, I came away feeling that the reports are, as a collective, fairly reliable. Not only is the language used to describe the events coherent across highly reported events, but the sentiment analysis showed that the words used were largely objective and non-emotive.

Lastly, it makes it easier to trust the dataset and the organization the collecting reports when you consider how few reports mention abductions and that NUFORC investigates only a relative few reports each year (and appears to do so with some skepticism, as in their annotations attributing the November 7th, 2015 reports to the navy missile launch).

ufo by Nook Fulloption from the Noun Project

Hi, I’m Katie Lazell-Fairman. I’m a data scientist based in New York City. Check out my other data projects on Github. Have questions about this post, or are you curious about this project? Comment below or feel free to contact me!