Analyzing ESPN’s YouTube Data: How Stephen A. Smith’s LeBron Commentary Drives Massive Views

Jen Brown
8 min readJun 10, 2024

--

As an NBA fan (particularly a LeBron fan) I spend an endless amount of hours watching games, and then tune in for First Take every week day at 10am EST to watch Stephen A. Smith’s commentary on basketball and LeBron James. Now my suspicion was that anytime LeBron is discussed, it usually takes a negative slant, but I wanted to dig into the data to prove it.

I decided to scrape ESPN’s Youtube channel data to analyze their LeBron related First Take titles and compare the views.

Objectives and Aims of the Analysis

The plan for the analysis is as followed:

  • Scrape Youtube channel ESPN data
  • Load the data into Jupyter
  • Use ChatGPT for a sentiment analysis of the titles
  • Compare the views of LeBron videos vs non-LeBron videos

Prerequesites

To complete this analysis you must have the following:

You’ll also need the following Python packages:

  • pandas
  • textblob
  • dateparser
  • matplotlib.pyplot

Step 1. Scrape Youtube Channel Data

First, the good news is you can start on Apify for free because they give you a $5.00 credit to start and they have a pre-made Youtube channel scraper.

Apify Youtube Scraper

Once you’re signed up and you’ved opened this “actor” as Apify calls it, you can fill in the information to pull ESPN data. I filled in the following inputs:

  • Direct URLs: https://www.youtube.com/@espn
  • Maximum Videos: 10,000 (Put 1000 if you want to make sure you don’t go over the free $5.00 credit)
  • Videos from last (e.g 2) days: 365
Fast Channel Youtube Scraper Inputs

Then click start in the upper right corner and it will open a page that shows live results as the actor is running.

Apify Actor Running Page

Then it will finish and show you a success message so you can export your results. The following screenshot shows 1 result as I re-did the export with a limit of 1 video as an example.

Apify Youtube Scraper Successful Page

I chose the csv option, and once exported, it outputted 36 columns worth of data, such as view count, thumbnail, title, duration, date etc.

Results from Apify Export

Step 2. Load Data into Jupyter

Start with importing the needed libraries and then pull the dataset into a dataframe. I always like to take a look at the data after pulling it into a dataframe using df.head().

import pandas as pd
from textblob import TextBlob
import matplotlib.pyplot as plt
import dateparser

# Replace with the path to your CSV file
file_path = 'dataset_youtube-channel-scraper.csv'

# Read the CSV file into a DataFrame
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame
print(df.head())

Step 2.1 Prep the Data

Make a subset dataframe of videos with titles containing First Take

# Set the option to display the rows fully
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

# Make a subset dataframe for videos with a title containing 'First Take'
subset_df = df[df['title'].str.contains('First Take', case=False, na=False)]

# View those titles
subset_df['title']

Then make another subset of First Take videos containing LeBron.

lebron_df = subset_df[subset_df['title'].str.contains('LeBron', case=False, encoding='utf-8-sig')]

Here’s the last few First Take titles that include ‘LeBron’:

  • Stephen A. ADDRESSES THE REALITY of LeBron’s role in the Lakers’ next head coach 👀 | First Take
  • Evaluating Bronny James’ draft stock, LeBron James’ disservice by amplifying attention | First Take
  • I CAN MAKE ONE SHOT! 🗣️ — Stephen A. adamant he could score 1 basket on LeBron | First Take
  • ☝ WHAT’S GOING ON WITH LEBRON? 👆 Windy analyzes LeBron’s Cavs publicity stunt | First Take
  • Perk: I WISH LEBRON WOULD RETIRE 🚨 + Stephen A. RANTS on LeBron’s role in Ham’s ousting | First Take
  • Stephen A. HATES LeBron James DOWNPLAYING the REMATCH vs. the Denver Nuggets 😡 | First Take

Step 3. ChatGPT Sentiment Analysis

I exported that list of titles and then ran them through ChatGPT for a quick sentiment analysis.

lebron_df[['title']].to_csv('titles.csv', index=False, encoding='utf-8-sig')

Here’s a summary of ChatGPT’s findings:

ChatGPT Sentiment Analysis Summary

Interestingly enough, I think several that were considered neutral or positive, could have easily been marked as negative.

For example the title ‘Windy analyzes LeBron’s Cavs publicity stunt’, as a human we understand that ‘publicity stunt’ has a negative connotation and ChatGPT says this as well, yet the title containing that term was marked ‘Neutral’.

ChatGPT sentiment analysis of ‘publicity stunt’

Overall, ChatGPT believes the LeBron-related First Take video titles to be mainly neutral and of the rest, still more positive than negative but when you dig in, you could make an argument that ChatGPT’s sentiment analysis model needs to be tuned more for this specific case.

Step 4. Compare the views of LeBron videos vs non-LeBron videos

To do a fair comparison of views, we would have to make sure the videos were put on Youtube at a similar time and are similar in duration, as those factors can influence the views of a video.

Step 4.1 Add a Category for the Duration of the Video

The duration column takes the format mm:ss but categorizing durations allows us to group similar videos together, making it easier to analyze trends and patterns.

Below we use a function to categorize each video based on the duration.

# Convert 'duration' column to timedelta type
subset_df['duration'] = pd.to_timedelta('00:' + subset_df['duration'])

# Function to categorize duration
def categorize_duration(duration):
seconds = duration.total_seconds()
if seconds <= 30:
return '0 to 30 secs'
elif seconds <= 60:
return '30 secs to 1 min'
elif seconds <= 120:
return '1 min to 2 mins'
elif seconds <= 180:
return '2 mins to 3 mins'
elif seconds <= 240:
return '3 mins to 4 mins'
elif seconds <= 300:
return '4 mins to 5 mins'
elif seconds <= 600:
return '5 mins to 10 mins'
elif seconds <= 900:
return '10 mins to 15 mins'
else:
return 'greater than 15 mins'

# Apply the function to create a new column 'category'
subset_df['duration_category'] = subset_df['duration'].apply(categorize_duration)

# Display the result
print(subset_df['duration_category'])
Duration Category Output

Step 4.2 Add a Category for LeBron titles and non LeBron titles

Now we need to do the same thing for categorizing the titles as LeBron containing and non LeBron titles.

# Create the 'title_category' column based on the presence of 'LeBron' in the title
subset_df['title_category'] = subset_df['title'].str.contains('LeBron', case=False)

# Convert boolean values to strings 'LeBron' and 'Not LeBron'
subset_df['title_category'] = subset_df['title_category'].map({True: 'LeBron', False: 'Not LeBron'})

subset_df[['title','title_category']]
Output of title_category

Now let’s look at the counts of videos by our title_category, duration_category and date columns.

# Group by 'title_category', 'duration_category', and 'date' and get counts
counts = subset_df.groupby(['title_category', 'duration_category', 'date']).size().reset_index(name='count')

print(counts)

By the looks of it, there is only enough of a sample size to compare the views of the 10 to 15 minute videos from 3 months ago, 10 to 15 minute videos from 5 months ago and the 5 to 10 minute videos from 3 months ago.

Counts of the videos by the groupings

Step 4.3 Graph the Average Views of the LeBron and Non-LeBron Videos

Let’s first look at 10 to 15 minute videos from 3 months ago.

# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '10 mins to 15 mins') & (subset_df['date'] == '3 months ago')) ]

# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()

# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])

# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 10 to 15 mins from 3 Months Ago')

# Show plot
plt.show()

The Non-LeBron videos have a slightly higher average view count.

Now we do the same for the 10 to 15 minute videos from 5 months ago.

# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '5 mins to 10 mins') & (subset_df['date'] == '3 months ago'))]

# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()

# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])

# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 10 to 15 mins from 3 Months Ago')

# Show plot
plt.show()

The LeBron videos have a higher average view count.

Now we do the same for the 5 to 10 minute videos from 3 months ago.

import pandas as pd
import matplotlib.pyplot as plt

# Assuming you have the DataFrame named subset_df

# Filter the DataFrame based on the conditions
filtered_df = subset_df[((subset_df['duration_category'] == '5 mins to 10 mins') & (subset_df['date'] == '3 months ago'))]

# Group by 'title_category' and calculate the average 'viewCount'
average_viewCount = filtered_df.groupby('title_category')['viewCount'].mean()

# Plotting the bar graph
average_viewCount.plot(kind='bar', color=['blue', 'green'])

# Adding labels and title
plt.xlabel('Title Category')
plt.ylabel('Average View Count')
plt.title('Average View Count by Title Category: 5 to 10 mins from 3 Months Ago')

# Show plot
plt.show()

Once again, the LeBron videos have a higher average view count.

Conclusion

Wrapping up the analysis of LeBron videos versus non-LeBron videos, it’s clear we’re onto something interesting. While ChatGPT marked most LeBron-titled videos as neutral (clearly, the model’s got some room for improvement), our comparison of average view counts tells a compelling story. Out of the three categories we looked at, two showed higher views for videos with ‘LeBron’ in the title. Now, does this mean LeBron’s name is a guaranteed ticket to views? Not necessarily. We’ve still got to dive deeper, run some stats, and widen our sample size to be sure. But hey, it’s a pretty intriguing start. As online content keeps evolving, figuring out what makes viewers tick is a never-ending adventure. Here’s to more digging, more insights, and maybe even a few surprises along the way!

Disclaimer: Please note that I may receive a commission for purchases made through these links. This comes at no additional cost to you and helps support my channel. Thank you for your support!

--

--

Jen Brown

Data Analyst with an Interest in sports, entrepreneurship and much more!