Analyzing ESPN’s YouTube Data: How Stephen A. Smith’s LeBron Commentary Drives Massive Views
As an NBA fan (particularly a LeBron fan) I spend an endless amount of hours watching games, and then tune in for First Take every week day at 10am EST to watch Stephen A. Smith’s commentary on basketball and LeBron James. Now my suspicion was that anytime LeBron is discussed, it usually takes a negative slant, but I wanted to dig into the data to prove it.
I decided to scrape ESPN’s Youtube channel data to analyze their LeBron related First Take titles and compare the views.
Objectives and Aims of the Analysis
The plan for the analysis is as followed:
- Scrape Youtube channel ESPN data
- Load the data into Jupyter
- Use ChatGPT for a sentiment analysis of the titles
- Compare the views of LeBron videos vs non-LeBron videos
Prerequesites
To complete this analysis you must have the following:
- An Apify account
- Access to Jupyter Notebook
You’ll also need the following Python packages:
- pandas
- textblob
- dateparser
- matplotlib.pyplot
Step 1. Scrape Youtube Channel Data
First, the good news is you can start on Apify for free because they give you a $5.00 credit to start and they have a pre-made Youtube channel scraper.
Once you’re signed up and you’ved opened this “actor” as Apify calls it, you can fill in the information to pull ESPN data. I filled in the following inputs:
- Direct URLs: https://www.youtube.com/@espn
- Maximum Videos: 10,000 (Put 1000 if you want to make sure you don’t go over the free $5.00 credit)
- Videos from last (e.g 2) days: 365
Then click start in the upper right corner and it will open a page that shows live results as the actor is running.
Then it will finish and show you a success message so you can export your results. The following screenshot shows 1 result as I re-did the export with a limit of 1 video as an example.
I chose the csv option, and once exported, it outputted 36 columns worth of data, such as view count, thumbnail, title, duration, date etc.
Step 2. Load Data into Jupyter
Start with importing the needed libraries and then pull the dataset into a dataframe. I always like to take a look at the data after pulling it into a dataframe using df.head().
import pandas as pd
from textblob import TextBlob
import matplotlib.pyplot as plt
import dateparser
# Replace with the path to your CSV file
file_path = 'dataset_youtube-channel-scraper.csv'
# Read the CSV file into a DataFrame
df = pd.read_csv(file_path)
# Display the first few rows of the DataFrame
print(df.head())
Step 2.1 Prep the Data
Make a subset dataframe of videos with titles containing First Take
# Set the option to display the rows fully
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
# Make a subset dataframe for videos with a title containing 'First Take'
subset_df = df[df['title'].str.contains('First Take', case=False, na=False)]
# View those titles
subset_df['title']
Then make another subset of First Take videos containing LeBron.
lebron_df = subset_df[subset_df['title'].str.contains('LeBron', case=False, encoding='utf-8-sig')]
Here’s the last few First Take titles that include ‘LeBron’:
- Stephen A. ADDRESSES THE REALITY of LeBron’s role in the Lakers’ next head coach 👀 | First Take
- Evaluating Bronny James’ draft stock, LeBron James’ disservice by amplifying attention | First Take
- I CAN MAKE ONE SHOT! 🗣️ — Stephen A. adamant he could score 1 basket on LeBron | First Take
- ☝ WHAT’S GOING ON WITH LEBRON? 👆 Windy analyzes LeBron’s Cavs publicity stunt | First Take
- Perk: I WISH LEBRON WOULD RETIRE 🚨 + Stephen A. RANTS on LeBron’s role in Ham’s ousting | First Take
- Stephen A. HATES LeBron James DOWNPLAYING the REMATCH vs. the Denver Nuggets 😡 | First Take
Step 3. ChatGPT Sentiment Analysis
I exported that list of titles and then ran them through ChatGPT for a quick sentiment analysis.
lebron_df[['title']].to_csv('titles.csv', index=False, encoding='utf-8-sig')
Here’s a summary of ChatGPT’s findings:
Interestingly enough, I think several that were considered neutral or positive, could have easily been marked as negative.
For example the title ‘Windy analyzes LeBron’s Cavs publicity stunt’, as a human we understand that ‘publicity stunt’ has a negative connotation and ChatGPT says this as well, yet the title containing that term was marked ‘Neutral’.
Overall, ChatGPT believes the LeBron-related First Take video titles to be mainly neutral and of the rest, still more positive than negative but when you dig in, you could make an argument that ChatGPT’s sentiment analysis model needs to be tuned more for this specific case.
Step 4. Compare the views of LeBron videos vs non-LeBron videos
To do a fair comparison of views, we would have to make sure the videos were put on Youtube at a similar time and are similar in duration, as those factors can influence the views of a video.
Step 4.1 Add a Category for the Duration of the Video
The duration column takes the format mm:ss but categorizing durations allows us to group similar videos together, making it easier to analyze trends and patterns.
Below we use a function to categorize each video based on the duration.
# Convert 'duration' column to timedelta type
subset_df['duration'] = pd.to_timedelta('00:' + subset_df['duration'])
# Function to categorize duration
def categorize_duration(duration):
seconds = duration.total_seconds()
if seconds <= 30:
return '0 to 30 secs'
elif seconds <= 60:
return '30 secs to 1 min'
elif seconds <= 120:
return '1 min to 2 mins'
elif seconds <= 180:
return '2 mins to 3 mins'
elif seconds <= 240:
return '3 mins to 4 mins'
elif seconds <= 300:
return '4 mins to 5 mins'
elif seconds <= 600:
return '5 mins to 10 mins'
elif seconds <= 900:
return '10 mins to 15 mins'
else:
return 'greater than 15 mins'
# Apply the function to create a new column 'category'
subset_df['duration_category'] = subset_df['duration'].apply(categorize_duration)
# Display the result
print(subset_df['duration_category'])
Step 4.2 Add a Category for LeBron titles and non LeBron titles
Now we need to do the same thing for categorizing the titles as LeBron containing and non LeBron titles.
# Create the 'title_category' column based on the presence of 'LeBron' in the title
subset_df['title_category'] = subset_df['title'].str.contains('LeBron', case=False)
# Convert boolean values to strings 'LeBron' and 'Not LeBron'
subset_df['title_category'] = subset_df['title_category'].map({True: 'LeBron', False: 'Not LeBron'})
subset_df[['title','title_category']]
Now let’s look at the counts of videos by our title_category, duration_category and date columns.
# Group by 'title_category', 'duration_category', and 'date' and get counts
counts = subset_df.groupby(['title_category', 'duration_category', 'date']).size().reset_index(name='count')
print(counts)
By the looks of it, there is only enough of a sample size to compare the views of the 10 to 15 minute videos from 3 months ago, 10 to 15 minute videos from 5 months ago and the 5 to 10 minute videos from 3 months ago.
Step 4.3 Graph the Average Views of the LeBron and Non-LeBron Videos
Let’s first look at 10 to 15 minute videos from 3 months ago.
# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '10 mins to 15 mins') & (subset_df['date'] == '3 months ago')) ]
# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()
# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])
# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 10 to 15 mins from 3 Months Ago')
# Show plot
plt.show()
The Non-LeBron videos have a slightly higher average view count.
Now we do the same for the 10 to 15 minute videos from 5 months ago.
# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '5 mins to 10 mins') & (subset_df['date'] == '3 months ago'))]
# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()
# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])
# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 10 to 15 mins from 3 Months Ago')
# Show plot
plt.show()
The LeBron videos have a higher average view count.
Now we do the same for the 5 to 10 minute videos from 3 months ago.
import pandas as pd
import matplotlib.pyplot as plt
# Assuming you have the DataFrame named subset_df
# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '5 mins to 10 mins') & (subset_df['date'] == '3 months ago'))]
# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()
# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])
# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 5 to 10 mins from 3 Months Ago')
# Show plot
plt.show()
Once again, the LeBron videos have a higher average view count.
Conclusion
Wrapping up the analysis of LeBron videos versus non-LeBron videos, it’s clear we’re onto something interesting. While ChatGPT marked most LeBron-titled videos as neutral (clearly, the model’s got some room for improvement), our comparison of average view counts tells a compelling story. Out of the three categories we looked at, two showed higher views for videos with ‘LeBron’ in the title. Now, does this mean LeBron’s name is a guaranteed ticket to views? Not necessarily. We’ve still got to dive deeper, run some stats, and widen our sample size to be sure. But hey, it’s a pretty intriguing start. As online content keeps evolving, figuring out what makes viewers tick is a never-ending adventure. Here’s to more digging, more insights, and maybe even a few surprises along the way!
Disclaimer: Please note that I may receive a commission for purchases made through these links. This comes at no additional cost to you and helps support my channel. Thank you for your support!