University of North Carolina vs. Duke Football Attendance

An Analysis of One Of the Biggest Rivalries in College Sports

Matt Dawidowicz
CodeX
4 min readAug 16, 2021

--

Photo by Drew Perales on Unsplash

Football (that is, American football) is a huge phenomenon across the American South, especially among college teams. There are many college teams throughout North Carolina, but the two most prominent teams are those of the University of North Carolina Tar Heels and the Duke Blue Devils.

For our analysis, we will be using Safegraph hybrid POI-Patterns data as well as data across the state of North Carolina. SafeGraph is a data provider that provides point-of-interest and foot traffic data for thousands of businesses and categories. It provides data for free, as long as you sign up as an academic. The schema for the Patterns data can be found here.

A big reason we are focusing on North Carolina is:

  • It is a highly populated urban state in the United States.
  • As a state in the South, there is a large amount of data relating to college football attendance.

For links of the Colab notebook involved, click here and then go to the North Carolina section: Link

Setup

The process of imported packages and reading the data is all above. Refer to the comments and descriptions there for information on that topic.

Below is a sample of the data from UNC’s attendance data used from Safegraph:

And Duke’s attendance data is similar.

Column descriptions:

  • placekey: A unique Safegraph Placekey denoting the specific Placekey the stadium is located…each row is a month.
  • safegraph_place_id: The unique key for the location within the Placekey.
  • location_name: The name of the location.
  • popularity_by_hour: The average attendance of the location by hour over the month.
  • poi_cbg: the Census Block Group of the location.
  • visitor_home_cbgs: A dictionary of the Census Block Group that the visitors of the stadiums come from.

Analysis

The pattern data has columns removed, and all data irrelevant to Kenan (University of North Carolina’s stadium) and Wallace Wade (Duke’s stadium) are filtered out.

The FIPS country codes are then extracted, and then that column is used to merge with the list of all county codes to create a list of all the counties where a game attendee can call itself home, at least by the definition of Safegraph.

From there, we compare the two attendance numbers by county and show which counties have more attendance. This shows which counties have more attendees instead of comparing the colors of two maps.

UNC by breadth across the state absolutely dominates over Duke.

You would not expect regional patterns to be that distinct, because the two colleges are in the same county, at least partially: Chapel Hill, the home of UNC, straddles between Orange and Durham, and Duke is solely within Durham.

The only counties where Duke has an attendance advantage is Durham County (its home county), its neighbors of Person, Granville, and Vance County, (but not Orange or Wake County, UNC dominates those), and the small, distant counties of Hoke, Polk, and Cherokee County.

In terms of absolute attendance across the state within the dataset, UNC has a massive lead: UNC had 12262 visitors within the year, while Duke had only 3147.

This general trend likely has to do with educational and economic cleavages, since UNC is a public university for the entire state, while Duke is a private university that attracts students in the local area and the world over, and anyone from outside North Carolina is not included in this dataset.

(Note: There were no attendees for either school in Greene, Northampton, Hertford, Gates, Chowan, Perquimans, Washington, Tyrrell, or Hyde counties).

These conclusions make sense, as certain schools have wider but less dense attendance. Public, less prominent universities (with more breadth) vs private universities (or more prominent public universities) will have more diffuse support versus the universities that may have more attendance across states. However, this project could use patterns from far more schools. This project could eventually be expanded to include other schools, states, and sports.

Conclusion

For more information on the data being used, visit Safegraph. Anyone is free to use the above data to expand on or verify these conclusions.

Questions?

I invite you to ask them in the #safegraphdata channel of the SafeGraph Community, a free Slack community for data enthusiasts. Receive support, share your work, or connect with others in the GIS community. Through the SafeGraph Community, academics have free access to data on over 7 million businesses in the USA, UK, and Canada.

--

--

Matt Dawidowicz
CodeX

I am an aspiring data scientist and comedian, who loves analyzing data and everything it can tell us.