Streaming Wars with Sentiment Analysis using Roberta model: HBO Max

Shrunali Suresh Salian
11 min readJun 6, 2023

--

Nearly three years after HBO Max’s launch, AT&T announced plans to sell parent unit WarnerMedia to Discovery. The deal effectively pulls AT&T out of the content waters. AT&T paid $85 billion for Time Warner Inc. three years ago. Its deal with Discovery values the WarnerMedia assets at $43 billion worth of cash and debt.

HBO Max is a popular streaming platform that offers a vast array of content from the renowned HBO network, movies, TV shows, and exclusive original programming. Launched in May 2020, HBO Max quickly established itself as a prominent player in the streaming industry.

HBO Max saw the biggest rise in market share from 10% to 15% in 2021/22. It has 14% market share size in the US, a 1% less than Disney+. It had the biggest market share in Q1 2022. At 21%, it had a 4% lead over Disney+ and a 6% lead over Netflix’s app.

The platform includes a diverse range of exclusive original programming. HBO Max’s original content has received critical acclaim and has helped to establish the platform as a home for high-quality and compelling storytelling.

While the platform may not offer the same volume of content as some of its competitors, it prioritizes delivering a curated selection of top-notch programming like “Game of Thrones,” “The Sopranos,” “Westworld,” and “Succession.”

The article is based on the Kaggle dataset available for HBO Max . It’s important to note that the analysis and conclusions based on the dataset for HBO Max may not reflect the current trends and offerings of HBO Max. The dataset provides insights based on a specific time period and may not capture the most up-to-date information or changes in the streaming platform.

Are you a movie person or a tv show person?

For every 4 movies on HBO Max there’s one TV Show

This distribution showcases HBO Max’s commitment to offering a diverse range of content, catering to the preferences of both movie enthusiasts and TV series aficionados.

count_data = showtime['type'].value_counts()

# Create a horizontal bar chart
fig = go.Figure(data=go.Bar(
y=count_data.index,
x=count_data.values,
orientation='h',
marker=dict(color=['#7A6BE2', '#7BA7F2'])
))

# Set title and axis labels
fig.update_layout(
title='Content on HBO Max',
xaxis_title='Count',
yaxis_title='Type of Content', title_x = 0.5
)
fig.show()

HBO Max doing great on home ground

Top 4 countries of HBO Max audience

HBO Max’s market share is dominated by the United States, which holds a significant 83% share of the platform’s user base. This indicates a strong presence and popularity within its home country, where HBO Max has likely gained traction through its established brand recognition and content offerings.

Following the United States, the United Kingdom holds the second-largest market share for HBO Max, with 8.3%. In addition to the UK, France (4.46%) and Japan (4.15%) also contribute to HBO Max’s market share, albeit to a lesser extent.

top_countries = pd.DataFrame(showtime['production_country'].value_counts()[:10])
top_countries = top_countries.reset_index()
top_countries = top_countries[top_countries['index'] != 'No Data']
top_countries = top_countries[:4]
fig = px.pie(top_countries, values='content_produced', names='production_country',
title='Contribution of Content Produced by Top 4 Countries on HBO Max')

# Change the color palette
fig.update_traces(marker=dict(colors=['#000000', '#941DE8','#7A6BE2','#7BA7F2']))

# Set the text position and information to be displayed
fig.update_traces(textposition='inside', textinfo='percent+label')
# fig.update_traces(title_x = 0.5)

fig.show()

Does your country determine what you’d be watching tonight?

Looks like the Italians and the French love watching HBO Max movies

In Italy (IT) and France (FR), a significant majority of viewers, accounting for 97% of the audience, prefer watching movies on HBO Max. Similarly, in Japan (JP), Australia (AU), and Germany (DE), over 90% of viewers lean towards movies on HBO Max.

On the other hand, in Spain (ES), a notable proportion of the audience, representing 47%, gravitates towards shows on HBO Max. This indicates a higher demand for TV series and suggests that episodic storytelling and long-form narratives are favored by viewers in Spain.

country_order = showtime['production_country'].value_counts()[:11].index
data = showtime[['type', 'production_country']].groupby('production_country')['type'].value_counts().unstack().loc[country_order]
data['sum'] = data.sum(axis=1)
data_ratio = (data.T / data['sum']).T[['MOVIE', 'SHOW']].sort_values(by='MOVIE',ascending=False)[::-1]
data_ratio = data_ratio.reset_index()
data_ratio = data_ratio[data_ratio['index'] != 'No Data']
data_ratio.rename(columns = {'index':'country_code'}, inplace = True)
data_ratio['MOVIE'] = round(data_ratio['MOVIE'], 2)
data_ratio['SHOW'] = round(data_ratio['SHOW'],2)
fig = go.Figure()

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['MOVIE'],
name='MOVIE',
orientation='h',
marker=dict(color='#941DE8'),
text=(data_ratio['MOVIE'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['SHOW'],
name='SHOW',
orientation='h',
marker=dict(color='#7BA7F2'),
text=(data_ratio['SHOW'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

# Set the layout
fig.update_layout(
title='Content Distribution by Country on HBO Max',
barmode='stack',
yaxis_title='Top 10 Countries',
xaxis=dict(showticklabels=False), title_x = 0.5 # Hide the x-axis tick labels
)

fig.show()

Drama reigns supreme once more!

Drama movies hold the top position in terms of quantity, followed by comedy and documentaries.

This distribution of content genres reflects the platform’s emphasis on offering a diverse range of cinematic experiences to cater to the varied tastes and preferences of its audience.

genre_distribution = pd.DataFrame(showtime.groupby('primary_genre')['type'].value_counts())
genre_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
genre_distribution['SUM'] = genre_distribution.sum(axis = 1)
genre_distribution.columns = ['primary_genre', 'MOVIE', 'SHOW', 'total']
genre_distribution = genre_distribution.sort_values('total', ascending = False)
fig1 = go.Figure()
fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#000000'),
))

fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['SHOW'],
name='SHOW',
marker=dict(color='#7BA7F2'),
))

fig1.update_layout(
title='Content Distribution by Genre on HBO Max',
xaxis_title='Genre',
yaxis_title='Content on HBO Max',
barmode='stack'
)
fig1.show()

For mature audiences only!

R-rated movies have the highest representation in terms of quantity, followed by movies with a PG-13 rating

This distribution indicates a significant presence of mature content on the platform, catering to viewers who enjoy films with more explicit content, intense themes, and adult-oriented storytelling.

rating_distribution = pd.DataFrame(showtime.groupby('age_certification')['type'].value_counts())
# rating_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
rating_distribution = rating_distribution.unstack().reset_index().fillna(0)
rating_distribution['SUM'] = rating_distribution.sum(axis = 1)
rating_distribution.columns = ['age_certification','MOVIE','SHOW','Total']
rating_distribution = rating_distribution.sort_values('Total', ascending = False)
fig2 = go.Figure()

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#000000'),

))

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['SHOW'],
name='SHOW',
marker=dict(color='#7BA7F2'),

))

fig2.update_layout(
title='Content Distribution by Age Rating Certification',
xaxis_title='Genre',
yaxis_title='Content on HBO Max',
barmode='stack', legend_title = 'Type of Content', title_x = 0.5
)

fig2.show()

How old is the content on HBO Max?

More recent content is being uploaded on HBO Max

HBO Max has been consistently uploading more recent content to its platform, ensuring that subscribers have access to a fresh and up-to-date library of movies and shows.

history = pd.DataFrame(showtime.groupby('release_year')['type'].value_counts())
history = history.unstack().reset_index().fillna(0)
# history['total'] = history.sum(axis = 1)
history.columns = ['release_year','MOVIE','SHOW']
history = history[(history['release_year'] >= 2000) & (history['release_year'] <= 2021)]
fig3 = go.Figure()
fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['MOVIE'],
mode='lines',
name='MOVIE',
fill='tozeroy',
line=dict(color='#000000')
))

fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['SHOW'],
mode='lines',
name='SHOW',
fill='tozeroy',
line=dict(color='#7BA7F2')
))

# Set the layout
fig3.update_layout(
title='Content Trend on HBO Max over the Years',
xaxis_title='Release Year',
yaxis_title='Content on HBO Max', showlegend = True, title_x =0.5
)
fig3.show()

An adult in US, good chance you’d be streaming content on HBO Max

HBO Max has more of an adult target audience

HBO Max is known for its emphasis on catering to an adult target audience. HBO Max’s commitment to producing original programming with edgier and more adult-oriented content further solidifies its appeal to an adult target audience. The platform is known for pushing creative boundaries and exploring complex subject matter, offering a sophisticated viewing experience that caters to the discerning tastes of adult viewers.

total_count = demographic_data['count'].sum()
demographic_data['percentage'] = (demographic_data['count'] / total_count) * 100

fig = px.treemap(demographic_data, path=['production_country', 'target_ages'], values='percentage',
color='target_ages', color_discrete_sequence= ['#000000', '#941DE8','#7A6BE2','#7BA7F2'])

fig.update_layout(title= "HBO Max's Country-Level Target Audience",
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

Comedy content on HBO Max, count me in!

HBO Max offers a diverse range of content tailored to the preferences of different regions.

In the United States, the platform’s library is rich in comedy content, making it a go-to destination for viewers seeking laughter and light-hearted entertainment. On the other hand, in the United Kingdom (UK), France (FR), and Japan (JP), drama content takes center stage on HBO Max.

You may watch more movies but you like shows better 😏

HBO Max shows are rated higher than movies

On HBO Max, shows tend to receive higher ratings compared to movies, indicating that the platform’s original and curated series are highly regarded by viewers. This suggests that HBO Max has been successful in producing and acquiring high-quality shows that resonate with audiences and receive critical acclaim.

fig = px.scatter(showtime, x='imdb_score', y='tmdb_score', color='type',
color_discrete_map={'MOVIE': '#000000', 'SHOW': '#7BA7F2'},
hover_data=['title'])

fig.update_layout(title='IMDb Score vs TMDB Score',
xaxis_title='IMDb Score',
yaxis_title='TMDB Score',
legend_title='Type', title_x =0.5)

fig.show()

Negative or Positive Content?

Using Roberta model by Hugging Face to categorize movies as Positive, Neutral and Negative based on the description

Hugging Face’s Roberta model helps in gauging the sentiment of the content based on the description provided. Roberta excels in understanding context and language patterns and is better at sarcastic sentences, while VADER focuses on sentiment quantification.

The chart provides a quality assessment for each of the movies using the Roberta model. The model helps us in understanding what type of content is produced by HBO Max.

53% of content on HBO Max is neutral, 26% is negative and 20% is positive

According to data, approximately 53% of the content available on HBO Max is categorized as neutral. This indicates that a significant portion of the content falls within a balanced range, neither strongly positive nor negative in sentiment.

Around 26% of the content is classified as negative, suggesting that HBO Max also features programming that explores darker or more challenging subject matter. This could include content with intense themes, conflict, or narratives that evoke a range of emotions. Such content provides viewers with thought-provoking and emotionally impactful experiences.

Approximately 20% of the content on HBO Max is categorized as positive. This indicates a significant presence of content that offers uplifting, inspirational, or feel-good elements.

Is there any relationship between the sentiment of the content with the IMDb & TMDb scores?

It appears that there is no discernible relationship between the sentiment of the content on HBO Max and the IMDb and TMDb scores. Regardless of whether the content is categorized as positive, negative, or neutral, the ratings assigned by viewers on these platforms remain relatively consistent.

This finding suggests that viewers’ opinions and evaluations of the content are not strongly influenced by its sentiment. Instead, other factors such as the overall quality, storytelling, production values, performances, and personal preferences seem to play a more significant role in determining the IMDb and TMDb scores.

It is worth noting that IMDb and TMDb scores are subjective and represent the collective ratings provided by users. While sentiment can contribute to viewers’ perceptions of the content, it is just one aspect among many that shape their opinions and evaluations.

The lack of a significant relationship between sentiment and ratings underscores the importance of providing diverse content on HBO Max that caters to various tastes and preferences. By offering a wide range of genres, themes, and storytelling approaches, HBO Max aims to appeal to a broad audience, ensuring that viewers can find content that aligns with their individual interests, regardless of its sentiment.

Majority of drama content aims to provide a balanced portrayal of emotions and storytelling, with a notable presence of more intense or challenging narratives.

Comedy: Comedy content on HBO Max is characterized by a higher proportion of neutral offerings, followed by positive, and then negative which suggests a light-hearted and entertaining experiences while maintaining a balanced tone overall.

Documentaries: Within the documentary genre, the content is primarily classified as neutral, followed by positive, and then negative. This indicates that HBO Max offers a significant number of informative and educational documentaries that provide viewers with a balanced perspective on various subjects.

Thriller: In the thriller genre, the content is predominantly categorized as negative, followed by neutral. This suggests that thrillers on HBO Max often delve into darker and more suspenseful themes, aiming to evoke a sense of tension and unease.

genre_sentiment = showtime.groupby(['primary_genre', 'sentiment']).size().reset_index(name='count')
genre_sentiment = genre_sentiment[genre_sentiment['primary_genre']!= 'No Data']
genre_sentiment = genre_sentiment.sort_values('count', ascending = False)
colors = ['#000000', '#941DE8','#7A6BE2','#7BA7F2']

fig = px.sunburst(genre_sentiment, path=['primary_genre', 'sentiment'], values='count',
color_discrete_sequence=colors)

fig.update_layout(title='Sentiments based Genre Strategy on HBO Max', title_x = 0.5)

fig.show()
It is important to note that the sentiment analysis is based on an interpretation of the content and may not capture all nuances.

R-rated movies cater to a diverse range of emotions and themes, providing a balanced portrayal of storytelling that can evoke different responses from viewers.

Within the PG-13 category, the sentiment distribution follows a similar pattern with a higher presence of neutral content, followed by negative and then positive. This indicates that PG-13 movies on HBO Max aim to strike a balance between engaging storytelling and age-appropriate content, encompassing a variety of themes and emotional experiences.

What are your favorite movies/shows on HBO Max?

Word cloud of titles of content on HBO Max

The presence of “Batman” and “Scooby Doo” as prominent words in the word cloud suggests that HBO Max offers content related to these well-known franchises, catering to fans of these iconic characters and their respective universes.

# Define a function to specify the text color
def hbo_color(word, font_size, position, orientation, random_state=None, **kwargs):
return "#7BA7F2"

custom_mask = np.array(Image.open('max-logo-1.jpg.webp'))
wc = WordCloud(
stopwords = stopwords,
mask = custom_mask,height = 2000, width = 4000, color_func = hbo_color)
#background_color = 'white',
wc.generate(title_corpus)

plt.figure(figsize=(16,8))
plt.imshow(wc, interpolation = 'bilinear')
plt.axis('off')
plt.show()

For how long would you be watching TV?

Runtime of content on HBO Max

It is evident that the distribution does not follow a smooth bell curve. Instead, it indicates the presence of a diverse range of content with varying durations. The data suggests that HBO Max offers a significant number of both short-format and long-format content. The highest number of content falls within the average runtime of 90 minutes.

colors = ['#000000', '#941DE8','#7A6BE2','#7BA7F2']

fig = px.histogram(showtime, x='runtime', color_discrete_sequence=colors)

# Customize the layout if needed
fig.update_layout(
title='Content Runtime Distribution on HBO Max',
xaxis_title='Runtime',
yaxis_title='Count of Content', title_x = 0.5, bargap=0.2
)

fig.show()

HBO Max has emerged as a formidable player in the competitive field of OTT streaming, thanks to its renowned and exclusive original programming. This early success has solidified their position as a strong contender in the industry. While their primary focus remains on the US market, HBO Max’s exceptional content sets a high standard, challenging both Netflix and Amazon to elevate their game.

Despite a significant portion of HBO Max’s content catering to mature audiences, leading to a slightly higher percentage of negative content, they have not overlooked the younger crowd. Shows like Batman and Scooby Doo have garnered high ratings, demonstrating their ability to captivate diverse age groups.

The project code is available on my Github: https://github.com/shrunalisalian/Streaming-Wars

HBO Max Dataset on Kaggle: https://www.kaggle.com/datasets/dgoenrique/hbo-max-movies-and-tv-shows

In case you enjoyed reading this article, feel free to check out articles on Amazon Prime Video, Netflix, Disney+ , Paramount+ and Apple TV+

Feel free to let me know if you have any suggestions. Thank You for reading!

--

--