Super Bowl 2021

The attendance at Super Bowl 2021 will be the smallest in the history of the game.

Kantida Nanon
Web Mining [IS688, Spring 2021]
6 min readFeb 3, 2021

--

Image from James Laird

The 2021 Super Bowl is right around the corner. Super Bowl 55 will be played on Sunday, February 7, 2021, at Raymond James Stadium in Tampa, Florida. As we know, this year has been a rough year, especially due to the current situation with the virus. During this pandemic, many try to avoid crowded areas due to the concern of their safety and health. Therefore, the National Football League (NFL) announced that only 22,000 seats or roughly 30 percent of capacity will be filled at Raymond James Stadium this year. All fans will be given masks and hand sanitizer. The attendance at Super Bowl 2021 will be the smallest in the history of the game. Throughout the pandemic season, the number of attendees will be varied, depending on local guidelines. Ken Belson claims, in some cities, a larger number of fans were accepted such as Dallas has an average of 28,187 fans at its eight home games, followed by Jacksonville and Tampa Bay. However, 13 of the 32 teams did not allow fans at any games. As fewer fans will be attending the Super Bowl this year, questions are surrounding if this will affect anything? What has the football fan’s recently talked/gossiped about? Are they concerned/surprised/upset/glad? What have they mentioned about the Super Bowl that is coming up? What is the most used keyword in all of r/superbowl? Was the keyword ‘tickets’ more often used than ‘home’ or vice versa?

Where do football fans talk/discuss/gossip?

Reddit is one of the biggest online communities which should have a significant number of football fans.

Data collection

One of the biggest online community names “Reddit” was selected as the data source since there are over 330 million Reddit users called Redditors and over 138,000 active categories or communities called Subreddits. Thus this data source could have a large number of football fans that express their recent opinion or comments regarding the Super Bowl. We might see some insight regarding fewer fans attending the Super Bowl this year. I have collected the most recent 500 posts (February 1, 2021) on the Super Bowl subreddit (r/superbowl) with the Python Reddit API Wrapper (PRAW).

Creating a Reddit application

To collect the data from Reddit, we need to register for API by creating a Reddit application on https://www.reddit.com/prefs/apps

  1. Click on “are you a developer? create an application”
  2. Select “web app”
  3. Fill out your application’s details including name, description, URL (optional), and redirect uri.
  4. Click on “create app”
Creating a Reddit application

Now we will get the web app key and the secret key for the Python script to read the data from Reddit in the next step.

Collecting Reddit data with PRAW

  1. Download and install PRAW (The Python Reddit API Wrapper) on https://praw.readthedocs.io/en/latest
  2. Open a Python notebook and import PRAW.
  3. Copy the web app key and secret key that we got earlier and paste them into client_id and client_secret.
  4. Now for a given subreddit, we will be able to collect the newest posts to that subreddit.
Collecting Reddit data with PRAW

Since the goal is to study the football fan's recent opinion or posts regarding fewer fans attending the Super Bowl this year, thus I have collected the most recent 500 post submission data on the Super Bowl subreddit (r/superbowl) by using the script in the figure below.

Collecting the most recent post on the “r/superbowl” subreddit

Data manipulation

After we collect the 500 newest posts related to Super Bowl, the next step is data cleaning. This step is for removing the unuseful data out of the data set, formatting, and preparing the data for analysis further. The picture below shows some examples of bugs and data that I have removed.

The newest post on the “s/superbowl” subreddit
The unuseful data being removed

Data mining

I used content mining, particularly word tokenization, to investigate the results and to see the common words people post and what is the most used word in all of r/superbowl. Below is a step by step process as to how we did this. We used Python in Jupyter Notebook and imported the data as a text file.

  1. Import pandas, collections, and matplotlib.pyplot library from the package.
  2. Open and read the text file, then using the loops to make each token lowercase.
  3. Remove stop words, punctuation, and other symbols by using the Python script below.
  4. Plot the 30 most common words along with summarizing the number of times the most common words occurred.
Content mining with Python script

Data analysis

The goal of this study is to find out what is the most used word in all of r/superbowl? Was ‘tickets’ more often used than ‘home’ or vice versa? The table below shows from the 500 newest posts related to Super Bowl on Reddits (February 1, 2021), there were 236 times people mentioned ‘owl’, followed by ‘super’ occurring 104 times, and ‘hangout’ occurring 45 times. Other common tokens include ‘great’ (41 times), ‘owls’ (35 times), ‘snowy’ (33 times), ‘home’ and ‘barred’ occurring 32 times, ‘tickets’, ‘horned’, and ‘little’ occurring 27 times, ‘FL’ and ‘one’ occurring 26 times, ‘guy’ and ‘out’ occurring 22 times followed by ‘superbowl’ occurring 21 times. Some other common keywords include ‘northern’ and ‘screech’ occurring 15 times, etc.

The table of the most common words from 500 posts on the super bowl subreddit

The graph below visualizes the 30 most common words/tokens from 500 posts in the super bowl subreddit (r/superbowl). The most common token was ‘owl’ occurring 236 times followed by ‘super’ which occurs 104 times. The third most common token was ‘hangout’ occurring 45 times. Other common tokens include ‘great’ (41 times), ‘owls’ (35 times), ‘snowy’ (33 times), ‘barred’ (32 times), ‘home’ (32 times), ‘horned’(27 times), ‘tickets’ (27 times), ‘FL’ (26 times), ‘superbowl’ (21 times), ‘northern’ (15 times), ‘screech’ (15 times), etc.

The bar graph of the most 30 common words from 500 posts in the super bowl subreddit

There are some interesting words among this data set including ‘hangout’, ‘home’, and ‘tickets, which could tell us the trend of football fan’s plans to watch the 55th Super Bowl game. The graph below shows the comparison among the common interesting words.

The comparison of the common keywords in r/superbowl subreddit

Even though the frequency of ‘home’ (32 times) and ‘tickets’ (27 times) is similar, shown in the graph below. It is safe to say that at least people mentioned ‘home’ more often than ‘tickets’. Redditors mentioned the keyword “home” most frequently. This could be because most people plan to watch the Superbowl from home this year, where they can be around their close family and friends. Tickets could be mentioned for a party to watch the Superbowl with friends or other game day events.

The chart of Redditor’s common words distribution

Discussion and Limitation

Some limitations include the number of data, this study has collected 500 posts from r/superbowl subreddit and Python Reddit API Wrapper (PRAW) which limits only 1,000 rows of data. Also, during the data cleaning process, I lost many data rows as they are unknown or N/A value and contain symbols or emojis. It is a very small sample. These very limited results and outcomes are based on people’s posts only during the day of the experiment.

References

[1] https://www.nytimes.com/2021/02/02/sports/football/super-bowl-2021-covid-coronavirus.html

[2] https://sports.nbcsports.com/2021/02/03/when-is-super-bowl-2021/

--

--