Is Bigfoot a Republican?

Published in

The Startup

10 min readJun 12, 2019

I recently came across a dataset containing all the Bigfoot sightings from 1867–2017. Naturally, I began to think about the political implications such data must contain. After all, the FBI did just reveal that they have been investigating Bigfoot since the 1970s. So if the government is interested, there must be a political reason. Such as Bigfoot being a dark money donor to the Republican Party!

Okay, jokes and conspiracy theories aside, I did think it would be fun to see whether Bigfoot sightings lined up with the results of the last presidential election. This led me to the tongue-in-cheek question laid out in the title of this article. Please allow me to walk you through this brief comparison of the results of the 2016 presidential election and historical Bigfoot sightings and see if we can’t answer the question we’ve always secretly had: Is Bigfoot a Republican?

Let’s get started!

1.0 Importing the Data

# Loading in our packages
 lapply(c(“tidyverse”, “openintro”, “usmap”),
 library,
 character.only = TRUE)
 
# Loading in our datasets
 elec_results <- read_csv(“/Users/auggieheschmeyer/Documents/Miscellaneous/R Practice/Bigfoot:Election Results/pres16results.csv”)
 bigfoot <- read_csv(“/Users/auggieheschmeyer/Documents/Miscellaneous/R Practice/Bigfoot:Election Results/bfro_reports_geocoded.csv”)
 
# Setting up some extra elements
 auggie_pink <- ‘#ffd1dc’
 auggie_blue <- ‘#5fa1e1’
 usa <- map_data(“usa”)
 states = map_data(‘state’)

To summarize the above code, I’m loading in some packages that will eventually help to plot Bigfoot sightings on a map, the aforementioned datasets and some extra elements that will help tie things together.

2.0 Exploring the Data

glimpse(elec_results)## Observations: 18,475
 ## Variables: 9
 ## $ county <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
 ## $ fips <chr> “US”, “US”, “US”, “US”, “US”, “US”, “US”, “US”, “US”…
 ## $ cand <chr> “Donald Trump”, “Hillary Clinton”, “Gary Johnson”, “…
 ## $ st <chr> “US”, “US”, “US”, “US”, “US”, “US”, “US”, “US”, “US”…
 ## $ pct_report <dbl> 0.9951, 0.9951, 0.9951, 0.9951, 0.9951, 0.9951, 0.99…
 ## $ votes <dbl> 60350241, 60981118, 4164589, 1255968, 451636, 180877…
 ## $ total_votes <dbl> 127592176, 127592176, 127592176, 127592176, 12759217…
 ## $ pct <dbl> 4.729933e-01, 4.779378e-01, 3.263985e-02, 9.843613e-…
 ## $ lead <chr> “Donald Trump”, “Donald Trump”, “Donald Trump”, “Don…unique(elec_results$cand)## [1] “Donald Trump” “Hillary Clinton” 
 ## [3] “Gary Johnson” “Jill Stein” 
 ## [5] “Evan McMullin” “Darrell Castle” 
 ## [7] “Gloria La Riva” “Rocky De La Fuente” 
 ## [9] “None of these candidates” “Richard Duncan” 
 ## [11] “Dan Vacek” “Alyson Kennedy” 
 ## [13] “Mike Smith” “Chris Keniston” 
 ## [15] “Lynn Kahn” “Jim Hedges” 
 ## [17] “Monica Moorehead” “Peter Skewes” 
 ## [19] “Emidio Soltysik” “Scott Copeland” 
 ## [21] “Tom Hoefling” “Rocky Giordani” 
 ## [23] “Laurence Kotlikoff” “Kyle Kopitke” 
 ## [25] “Joseph Maldonado” “Michael Maturen” 
 ## [27] “Princess Jacob” “Ryan Scott” 
 ## [29] “Rod Silva” “Jerry White” 
 ## [31] “Bradford Lyttle” “Frank Atwood” 
 ## [33] NAglimpse(bigfoot)## Observations: 4,586
 ## Variables: 27
 ## $ observed <chr> “Ed L. was salmon fishing with a companion in…
 ## $ location_details <chr> “East side of Prince William Sound”, “I would…
 ## $ county <chr> “Valdez-Chitina-Whittier County”, “York Count…
 ## $ state <chr> “Alaska”, “Pennsylvania”, “Oregon”, “Oklahoma…
 ## $ title <chr> NA, NA, NA, “Report 9765: Motorist and childr…
 ## $ latitude <dbl> NA, NA, NA, 35.30110, 39.38745, 43.27314, 39.…
 ## $ longitude <dbl> NA, NA, NA, -99.17020, -81.67339, -76.89331, …
 ## $ date <date> NA, NA, NA, 1973–09–28, 1971–08–01, 2003–09-…
 ## $ number <dbl> 1261, 8000, 703, 9765, 4983, 26566, 5692, 438…
 ## $ classification <chr> “Class A”, “Class B”, “Class B”, “Class A”, “…
 ## $ geohash <chr> NA, NA, NA, “9y32z667yc”, “dpjbj6r280”, “dr9q…
 ## $ temperature_high <dbl> NA, NA, NA, 72.55, 76.32, 67.62, 88.56, NA, 7…
 ## $ temperature_mid <dbl> NA, NA, NA, 63.225, 70.440, 58.160, 70.220, N…
 ## $ temperature_low <dbl> NA, NA, NA, 53.90, 64.56, 48.70, 51.88, NA, 5…
 ## $ dew_point <dbl> NA, NA, NA, 50.86, 62.45, 54.06, 43.89, NA, 5…
 ## $ humidity <dbl> NA, NA, NA, 0.73, 0.82, 0.75, 0.42, NA, 0.73,…
 ## $ cloud_cover <dbl> NA, NA, NA, 0.16, 0.86, 0.48, 0.00, NA, 0.22,…
 ## $ moon_phase <dbl> NA, NA, NA, 0.07, 0.32, 0.81, 0.02, NA, 0.10,…
 ## $ precip_intensity <dbl> NA, NA, NA, 0.0000, 0.0006, 0.0006, 0.0000, N…
 ## $ precip_probability <dbl> NA, NA, NA, 0.00, 0.21, 0.21, 0.00, NA, 0.30,…
 ## $ precip_type <chr> NA, NA, NA, NA, “rain”, “rain”, NA, NA, “rain…
 ## $ pressure <dbl> NA, NA, NA, 1017.29, 1022.74, 1020.75, 1011.9…
 ## $ summary <chr> NA, NA, NA, “Partly cloudy starting in the af…
 ## $ uv_index <dbl> NA, NA, NA, 6, 6, 4, 9, NA, 8, 6, 5, 6, 1, 2,…
 ## $ visibility <dbl> NA, NA, NA, 10.00, 4.97, 9.53, 9.76, NA, 9.47…
 ## $ wind_bearing <dbl> NA, NA, NA, 263, 156, 253, 197, NA, 234, 63, …
 ## $ wind_speed <dbl> NA, NA, NA, 8.15, 3.02, 8.73, 1.96, NA, 2.47,…

It’s always a good idea to take a look at the shape of the data so that you can understand what it looks like and what you can do with it. Above, we can see that the election results are broken down by county and state and show the percentage won by each major and minor candidate, as well as who the ultimate victor was in that county. There are quite a few candidates listed, but we can safely say that we’re only looking at Trump and Clinton here. Sorry, Gary ¯\_(ツ)_/¯

As for the Bigfoot data, god bless whoever was managing this dataset. Whereas the election results featured a mere nine variables, this Bigfoot data gives us a whopping 27 bits of information about each and every Bigfoot sighting. They even included the phase of the moon! They didn’t have to go THAT hard for this dataset, but they did that. They did that for us. Sadly, I’m only going to use location data this time around. Maybe I’ll have to revisit this dataset one day and use it to plan my own Bigfoot sighting.

3.0 Preparing the Data

# Preparing the data to be merged
 elec_results <- elec_results %>% filter(cand == “Donald Trump” | cand == “Hillary Clinton”, 
 !is.na(county)) %>% group_by(county) %>% arrange(county, desc(pct)) %>% 
 filter(pct == max(pct)) %>% mutate(state = abbr2state(st)) %>% select(county, 
 state, lead, pct)
 
 bigfoot <- bigfoot %>% select(date, county, state, latitude, longitude)

Now comes the part of this comparison where I get the data ready to answer the questions that we want answered. To make things simpler, above, I have filtered the election data down to just Trump and Clinton and chose just the variables showing who won and with what percentage.

Below, I join the two datasets together by county and by state. Take a look at the final product.

# Merging the two datasets
 combined <- bigfoot %>% inner_join(elec_results, by = c(“county”, “state”))
 
 head(combined, 10)## # A tibble: 10 x 7
 ## date county state latitude longitude lead pct
 ## <date> <chr> <chr> <dbl> <dbl> <chr> <dbl>
 ## 1 NA Yamhill Coun… Oregon NA NA Donald Tru… 0.501
 ## 2 1973–09–28 Washita Coun… Oklahoma 35.3 -99.2 Donald Tru… 0.832
 ## 3 1970–09–01 Washoe County Nevada 39.6 -120. Hillary Cl… 0.464
 ## 4 1979–07–04 Saunders Cou… Nebraska 41.2 -96.4 Donald Tru… 0.706
 ## 5 1988–03–15 Yancey County North Car… 35.7 -82.3 Donald Tru… 0.649
 ## 6 1988–12–15 Silver Bow C… Montana 46.1 -113. Hillary Cl… 0.527
 ## 7 2006–01–05 Tishomingo C… Mississip… 34.6 -88.2 Donald Tru… 0.856
 ## 8 2013–02–16 Tishomingo C… Mississip… 34.7 -88.3 Donald Tru… 0.856
 ## 9 2007–08–15 Silver Bow C… Montana 46.0 -112. Hillary Cl… 0.527
 ## 10 2011–08–21 Yancey County North Car… 35.8 -82.2 Donald Tru… 0.649

Finally, I am putting together one more dataset that I’ll use to map all of these sightings shortly.

# Prepping one more dataset that we’ll need for our map
 states <- states %>% mutate(state = paste(toupper(substring(region, 1, 1)), 
 substring(region, 2), sep = “”)) %>% left_join(elec_results, by = c(“state”)) %>% 
 select(long, lat, group, order, state, lead)
 
 head(states)## long lat group order state lead
 ## 1 -87.46201 30.38968 1 1 Alabama Donald Trump
 ## 2 -87.46201 30.38968 1 1 Alabama Donald Trump
 ## 3 -87.46201 30.38968 1 1 Alabama Donald Trump
 ## 4 -87.46201 30.38968 1 1 Alabama Donald Trump
 ## 5 -87.46201 30.38968 1 1 Alabama Hillary Clinton
 ## 6 -87.46201 30.38968 1 1 Alabama Donald Trump

4.0 Answering the Big Question

Now that the data is set up to my liking, I can finally start using it to get some answers. The first question I’m going to ask is whether or not there is any difference between the number of sightings in Trump and Clinton counties. To do this, I’m going to use a two-sided t-test. What this t-test is doing is determining whether the difference in the number of sightings between Trump counties and Clinton counties occurred because of chance. It poses the hypothesis “there is no difference between the results” and then tests whether that can reasonably be said to be true. In the statistics community, if the results observed (or results more extreme) would only be observed 5% or less of the time, then we reject that hypothesis and say that there is a “significant” difference.

Let’s see what the test says.

t_test <- combined %>% group_by(county, lead) %>% summarize(sightings = n())
 
 t.test(sightings ~ lead, data = t_test) ## 
 ## Welch Two Sample t-test
 ## 
 ## data: sightings by lead
 ## t = -2.3485, df = 158.23, p-value = 0.02008
 ## alternative hypothesis: true difference in means not equal to 0
 ## 95 percent confidence interval:
 ## -2.4356485 -0.2103888
 ## sample estimates:
 ## mean in group Donald Trump    mean in group Hillary Clinton 
 ## 3.059334                      4.382353

What the code did was count the number of sightings per county and then take the average number of sightings per county for those that voted for Trump and those that voted for Clinton. Those averages can be seen in the “sample estimates” part of the output. Clinton’s mean is larger, but again, this could be due to chance. If we look at the p-value, though, we see .02. This number is the estimate of how likely we are to see results this extreme. Since 2% is less than the 5% threshold that we saw earlier, we can reject the hypothesis that there is no difference between the number of sightings in Trump and Clinton counties.

This means that counties that voted for Donald Trump had, on average, 1.2 less Bigfoot sightings than those that voted for Clinton. So Bigfoot has a much larger presence in Democratic counties, huh? Seems like I might be wrong about a Republican Bigfoot…

5.0 Visual Representations

Okay, so Clinton counties may have had a higher average number of sightings, but maybe that’s because she had a smaller number of counties with a large number of sightings. Maybe Trump counties’ average was lower because there are a lot more of them. Let’s make a graphic to compare the proportions of Bigfoot sightings between the two sets of counties.

# Plotting the proportion of counties with Bigfoot sigtings (1869–2017)
 combined %>% distinct(county, .keep_all = TRUE) %>% select(lead) %>% group_by(lead) %>% 
 summarize(n = n()) %>% mutate(prop = paste(round(n/sum(n), 4) * 100, “%”, 
 sep = “”)) %>% ggplot(aes(x = lead, y = n, fill = lead, label = prop)) + 
 geom_col() + geom_text(aes(family = “Futura Medium”), vjust = -0.25) + scale_fill_manual(breaks = c(“Donald Trump”, 
 “Hillary Clinton”), values = c(auggie_pink, auggie_blue)) + labs(title = “2016 election results by counties with a bigfoot sighting”, 
 subtitle = “sightings from 1869–2017”, x = “”, y = “number of counties”, 
 caption = “an auggie heschmeyer visual”) + theme_classic() + theme(text = element_text(family = “Futura Medium”), 
 legend.position = “none”, plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

Whoa. It looks like my suspicion was right. There are a lot more Trump counties with Bigfoot sightings. What this chart tells me is that Bigfoots (Bigfeet?) have, historically, lived in counties that voted Republican in the last election. But how do I rectify this with the results of the t-test that showed Clinton counties with more average sightings? Has Bigfoot been going on vacation to the same Democratic counties and been spotted there over and over again? Maybe a map showing the sightings will help me sort things out.

# Plotting all Bigfoot sightings on a map
 combined %>% filter(longitude > -135) %>% ggplot(aes(x = longitude, y = latitude, 
 color = lead)) + geom_polygon(data = states, aes(x = long, y = lat, group = group), 
 fill = NA, color = “grey”, show.legend = FALSE) + geom_point(alpha = 0.5) + 
 scale_color_manual(breaks = c(“Donald Trump”, “Hillary Clinton”), values = c(auggie_pink, 
 auggie_blue)) + coord_quickmap() + labs(title = “bigfoot sightings (1869–2017)”, 
 subtitle = “colored by sighting county’s 2016 presidential candidate”, color = “”, 
 caption = “an auggie heschmeyer visual”) + theme_void() + theme(text = element_text(family = “Futura Medium”), 
 legend.position = “bottom”, plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

This definitely clears things up. Trump won a lot of sparsely-populated, rural counties throughout the US. “Sparsely-populated and rural” sounds a lot like places Bigfoot may want to live. All of those pink dots represent Bigfoot sightings in Trump counties and most of them aren’t near any major metropolis. The blue dots, however, do seem to be centered around those metropolises. This supports my theory that perhaps, like the rest of us, Bigfoot enjoys visiting the city and given how many people there are in the city, the chances of him/her being spotted by more people goes way up.

6.0 In Conclusion

So, is Bigfoot a Republican? While I may have had a lot of fun playing around with this data, I don’t think I can definitively make that call. Given that he/she lives in rural, Trump-leaning counties, it seems like a safe assumption, though. Or maybe Bigfoot is a Democrat, but just can’t afford rent in the city. Perhaps we’ll see Bigfeet of the US unite and vote for Elizabeth Warren in 2020 as we can only assume that her wealth redistribution plan includes hairy, upright-walking, ape-like creatures.

Thank you for reading along. I hope to see you in the next case study.