Streaming Wars with Sentiment Analysis using Roberta model: Netflix

Shrunali Suresh Salian
12 min readJun 6, 2023

--

For an entire generation, the phrase “Netflix and Chill” became more than just a catchy slogan; it became a cultural phenomenon. The rise of Netflix as the go-to platform for entertainment marked a pivotal shift in the way we consume movies and TV shows.

But what made Netflix the biggest online streaming platform in the world? Let’s try and understand what made Netflix go from a DVD renting service in California to a global online streaming service. Was it just the recommendation systems? Or did Netflix strategically carve a niche for its target audience?

This article is based on the analysis of a Netflix Kaggle Dataset. It’s important to note that the analysis and conclusions based on the dataset for Netflix may not reflect the current trends and offerings of Netflix. The dataset provides insights based on a specific time period and may not capture the most up-to-date information or changes in the streaming platform.

Hope you enjoy reading through the article :)

What do you like watching on Netflix? Movies or TV Shows?

Netflix maintains a decent movies to tv shows ratio

As compared to Amazon Prime Video that has 6 movies for every tv show streaming on it’s platform, looks like Netflix has done a better job at focussing on long and short format content.

count_data = showtime['type'].value_counts()

# Create a horizontal bar chart
fig = go.Figure(data=go.Bar(
y=count_data.index,
x=count_data.values,
orientation='h',
marker=dict(color=['#b20710', '#f5f5f1'])
))

# Set title and axis labels
fig.update_layout(
title='Content on Netflix',
xaxis_title='Count',
yaxis_title='Type of Content', title_x = 0.5
)

# Show the plot
fig.show()

Where is Netflix driving most of it’s revenue from?

Netflix now ranks second after Amazon Prime Video in Q1 2023 market share in United States

Although Netflix may appear to be experiencing a decline in its American subscriber base, a closer examination of the data reveals its proficiency in effectively targeting diverse audiences worldwide. Amazon Prime Video remains focused on capturing the American, European, and recently, the Indian markets, Netflix has set its sights on the global stage, making waves with the immensely popular Korean K-Drama series that has garnered international attention. While it looks like both the dominant players are in a tough fight to grab the Indian audience, its only a matter of time to see who leads the race in the world’s largest democracy.

top_countries = pd.DataFrame(showtime['production_country'].value_counts()[:10])
top_countries = top_countries.reset_index()
top_countries = top_countries[top_countries['index'] != 'No Data']
top_countries = top_countries.rename(columns = {
'index' : 'production_country',
'production_country' : 'content_produced'
})
top_countries = top_countries[:5]
fig = px.pie(top_countries, values='content_produced', names='production_country',
title='Contribution of Content Produced by Top 5 Countries on Netflix')

# Change the color palette
fig.update_traces(marker=dict(colors=['#221f1f', '#b20710', '#e50914','#f5f5f1','#ffffff']))

# Set the text position and information to be displayed
fig.update_traces(textposition='inside', textinfo='percent+label')
# fig.update_traces(title_x = 0.5)
fig.show()

Movies or TV Shows? It depends on where you’re watching from?

A larger audience comes onboard to watch tv shows on Netflix as compared to Amazon Prime Video which streams a greater number of movies

While countries like Canada, Brazil, the United States, France, Mexico, and Spain demonstrate a relatively balanced interest in both movies and shows, the audience landscape in Asia showcases more extreme tendencies. In India, short format content holds a significant appeal, capturing the attention of viewers. On the other hand, Korean and Japanese audiences exhibit a particularly strong affinity for series, surpassing the average global citizen’s interest.

country_order = showtime['production_country'].value_counts()[:11].index
data = showtime[['type', 'production_country']].groupby('production_country')['type'].value_counts().unstack().loc[country_order]
data['sum'] = data.sum(axis=1)
data_ratio = (data.T / data['sum']).T[['MOVIE', 'SHOW']].sort_values(by='MOVIE',ascending=False)[::-1]
data_ratio = data_ratio.reset_index()
data_ratio = data_ratio[data_ratio['index'] != 'No Data']
data_ratio.rename(columns = {'index':'country_code'}, inplace = True)
data_ratio['MOVIE'] = round(data_ratio['MOVIE'], 2)
data_ratio['SHOW'] = round(data_ratio['SHOW'],2)
fig = go.Figure()
# ['#221f1f', '#b20710', '#e50914','#f5f5f1','#ffffff']
# Add horizontal bar traces for MOVIE and SHOW
fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['MOVIE'],
name='MOVIE',
orientation='h',
marker=dict(color='#221f1f'),
text=(data_ratio['MOVIE'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['SHOW'],
name='SHOW',
orientation='h',
marker=dict(color='#b20710'),
text=(data_ratio['SHOW'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

# Set the layout
fig.update_layout(
title='NETFLIX Content Distribution by Country',
barmode='stack',
yaxis_title='Top 10 Countries',
xaxis=dict(showticklabels=False), title_x = 0.5 # Hide the x-axis tick labels
)

fig.show()

Comedy or Drama?

Drama dominates again

Netflix sets itself apart from other streaming platforms with its intriguing balance between dramatic and comedy movies, offering viewers a diverse range of genres to choose from. However, in terms of horror content, Amazon Prime Video appears to have a more extensive selection compared to Netflix.

genre_distribution = pd.DataFrame(showtime.groupby('primary_genre')['type'].value_counts())
genre_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
genre_distribution['SUM'] = genre_distribution.sum(axis = 1)
genre_distribution.columns = ['primary_genre', 'MOVIE', 'SHOW', 'total']
genre_distribution = genre_distribution.sort_values('total', ascending = False)
fig1 = go.Figure()
fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#221f1f'),
))

fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['SHOW'],
name='SHOW',
marker=dict(color='#b20710'),
))

fig1.update_layout(
title='Netflix Content Distribution by Genre',
xaxis_title='Genre',
yaxis_title='Content on Netflix',
barmode='stack'
)

fig1.show()

Mature Audience?

Netflix has more shows for mature audiences

There is a distinct difference between Amazon and Netflix when it comes to mature audiences. While Amazon takes the lead in terms of hosting a higher number of R-rated movies, Netflix has strategically focused on developing a robust collection of TV shows catered towards mature audiences. This deliberate emphasis on TV shows allows Netflix to provide a diverse range of content that resonates with viewers seeking more mature and nuanced storytelling.

rating_distribution = pd.DataFrame(showtime.groupby('age_certification')['type'].value_counts())
# rating_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
rating_distribution = rating_distribution.unstack().reset_index().fillna(0)
rating_distribution['SUM'] = rating_distribution.sum(axis = 1)
rating_distribution.columns = ['age_certification','MOVIE','SHOW','Total']
rating_distribution = rating_distribution.sort_values('Total', ascending = False).drop(2)
fig2 = go.Figure()

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#221f1f'),

))

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['SHOW'],
name='SHOW',
marker=dict(color='#b20710'),

))

fig2.update_layout(
title='Content Distribution by Age Rating Certification on Netflix',
xaxis_title='Genre',
yaxis_title='Content on Netflix',
barmode='stack', legend_title = 'Type of Content', title_x = 0.5
)

fig2.show()

Content evolution on Netflix…

What hell broke loose in 2015?

2015 seems to be the turning points for most online streaming platforms. There has been a significant increase in the number of movies/shows streaming since 2015 as compared to before 2015.

history = pd.DataFrame(showtime.groupby('release_year')['type'].value_counts())
history = history.unstack().reset_index().fillna(0)
# history['total'] = history.sum(axis = 1)
history.columns = ['release_year','MOVIE','SHOW']
history = history[(history['release_year'] >= 2000) & (history['release_year'] <= 2021)]
fig3 = go.Figure()
fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['MOVIE'],
mode='lines',
name='MOVIE',
fill='tozeroy',
line=dict(color='#221f1f')
))

fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['SHOW'],
mode='lines',
name='SHOW',
fill='tozeroy',
line=dict(color='#b20710')
))

# Set the layout
fig3.update_layout(
title='Content Trend on Netflix over the Years',
xaxis_title='Release Year',
yaxis_title='Content on Netflix', showlegend = False, title_x =0.5
)
fig3.show()

Who is Netflix’s target audience in your country?

Netflix’s country level target audience

Netflix’s content targeting strategy exhibits variations across different regions. In countries such as the United States, United Kingdom, South Korea, France, and Spain, Netflix has a focused approach towards targeting adult audiences. This indicates that the platform tailors its content selection and original productions to cater to the preferences and interests of mature viewers in these regions.

However, when it comes to Indian and Japanese content, Netflix shifts its focus towards targeting teens. This approach reflects an understanding of the demographic trends and cultural nuances in these specific markets. Recognizing the popularity of teen-oriented content in India and Japan, Netflix curates a selection of shows and movies that resonate with the younger audience, offering them compelling and engaging entertainment options.

demographic_data['target_ages'] = demographic_data['age_certification'].map(ratings_ages)
demographic_data = demographic_data.dropna()
# demographic_data.groupby('production_country','target_ages')[['production_country','target_ages']].sum()
demographic_data = demographic_data.groupby(['production_country', 'target_ages']).size().reset_index(name='count').sort_values('count', ascending = False)[:20]
import plotly.express as px

total_count = demographic_data['count'].sum()
demographic_data['percentage'] = (demographic_data['count'] / total_count) * 100

fig = px.treemap(demographic_data, path=['production_country', 'target_ages'], values='percentage',
color='target_ages', color_discrete_sequence= ['#221f1f', '#b20710', '#e50914','#f5f5f1'])

fig.update_layout(title= "Netflix's Country-Level Target Audience",
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

What are you watching on Netflix?

Genre specific country level Netflix audiences

Netflix adopts a nuanced content strategy that takes into account the specific preferences of audiences in different regions. In South Korea, Netflix prioritizes the drama genre, recognizing the immense popularity and cultural significance of Korean dramas both domestically and internationally. This targeted approach allows Netflix to offer a rich selection of captivating dramas to cater to the preferences of Korean viewers.

For audiences in the United States, Netflix adopts a fun-loving and series-oriented approach, placing a particular focus on comedy and documentaries. This aligns with the tastes of American viewers who enjoy a mix of entertaining and informative content.

In India, where dramatic storytelling has a strong appeal, Netflix caters to the audience’s preference for compelling narratives and emotional depth by offering a wide range of dramatic content.

In the United Kingdom, Netflix recognizes the British audience’s penchant for documentaries and provides a diverse collection of captivating non-fiction content that appeals to their interest in real-life stories and informative programming.

Lastly, in Japan, Netflix acknowledges the audience’s affinity for action-oriented content and ensures a selection of thrilling and adrenaline-pumping shows and movies that cater to their preferences.

demo_genre = showtime.groupby(['production_country', 'primary_genre']).size().reset_index(name='count').sort_values('count', ascending = False)[:20]
demo_genre = demo_genre[demo_genre['production_country']!='No Data']

fig = px.treemap(demo_genre, path=['production_country', 'primary_genre'], values='count',
color_discrete_sequence= ['#221f1f', '#b20710', '#e50914','#f5f5f1']
)

# fig = px.treemap(demographic_data, path=['production_country', 'target_ages'], values='percentage',
# color='target_ages', color_discrete_sequence= ['#221f1f', '#b20710', '#e50914','#f5f5f1'])

fig.update_layout(title='Netflix Content by Genre and Country',
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

TV shows or movies? When it comes to Netflix, it’s tough to decide

Relationship between IMDb score and TMDb score on Netflix

When comparing Netflix to Amazon Prime Video, it is observed that Netflix tends to fare better in terms of overall ratings for both shows and movies. While Amazon Prime Video may have higher scores for its shows, Netflix generally maintains a stronger performance across its entire content library.

fig = px.scatter(showtime, x='imdb_score', y='tmdb_score', color='type',
color_discrete_map={'MOVIE': '#221f1f', 'SHOW': '#b20710'},
hover_data=['title'])

fig.update_layout(title='IMDb Score vs TMDB Score',
xaxis_title='IMDb Score',
yaxis_title='TMDB Score',
legend_title='Type', title_x =0.5)

fig.show()

What type of content are you watching on Netflix? Positive, Neutral or Negative?

Using Roberta model by Hugging Face to categorize movies as Positive, Neutral and Negative based on the content description

The Roberta model helps us in understanding what type of content is produced by Netflix for their audience.

Netflix does relatively better as compared to Amazon Prime Video in streaming ~24% positive content

Based on the categorization performed using the Roberta model, it appears that Netflix’s content is predominantly classified as neutral, accounting for 51.9% of their total content. Positive content makes up 23.5% of their library, while negative content constitutes 24.6%.

showtime['sentiment'] = showtime.apply(lambda row: 'Negative' if row['roberta_neg'] > 0.5 else ('Neutral' if row['roberta_neu'] > 0.5 else 'Positive'), axis=1)
sentiment_counts = showtime['sentiment'].value_counts()

# Create the donut chart trace
fig4 = go.Figure(data=[go.Pie(
labels=sentiment_counts.index,
values=sentiment_counts.values,
hole=0.5, # Set the hole parameter to create a donut chart
marker=dict(colors=['#221f1f', '#b20710', '#e50914']), # Set custom colors for the slices
textinfo='label+percent', # Display labels and percentages
textposition='inside', # Set the position of the labels inside the slice
)])

# Set the layout
fig4.update_layout(
title='Sentiment Distribution of Content on Netflix',
showlegend=True, title_x = 0.5,
# Add annotations in the center of the donut pies.
annotations=[dict(text='Netflix', x=0.50, y=0.5, font_size=15, showarrow=False)]
)
fig4.show()
Sentiment Analysis based on content description on Netflix

Drama and comedy genres collectively make up a significant portion of Netflix’s content, accounting for nearly 50% of their library. This suggests that Netflix places considerable emphasis on offering a diverse range of dramatic and comedic content to cater to the preferences of its audience.

genre_sentiment = showtime.groupby(['primary_genre', 'sentiment']).size().reset_index(name='count')
genre_sentiment = genre_sentiment[genre_sentiment['primary_genre']!= 'No Data']
genre_sentiment = genre_sentiment.sort_values('count', ascending = False)
colors = ['#221f1f', '#b20710', '#e50914','#f5f5f1']
fig = px.sunburst(genre_sentiment, path=['primary_genre', 'sentiment'], values='count',
color_discrete_sequence=colors)
fig.update_layout(title='Genre vs Sentiments on Netflix', title_x = 0.5)
fig.show()
Sentiments classified by Age certification ratings

It seems that on Netflix, a higher percentage of content with different age certifications, such as TV-MA, R, TV-14, PG-13, and PG, is categorized as neutral. This implies that a significant portion of content across various age ratings on Netflix maintains a balanced or neutral sentiment.

Netflix reached a pinnacle in 2018 with the quantity of content on it’s streaming platform

During the period from 2014 to 2018, Netflix experienced unprecedented growth in the number of content being streamed on its platform. Moreover, Netflix began to focus more on producing original content during this period. The platform launched several highly successful and critically acclaimed original series, such as “Stranger Things,” “House of Cards,” and “Narcos,” which garnered widespread popularity and contributed to the growth of the platform’s content library.

filter_showtime = showtime[(showtime['release_year'] >= 2010) & (showtime['release_year'] <= 2020)]

filter_showtime = filter_showtime.groupby(['release_year', 'sentiment']).size().reset_index(name='count')

colors = ['#221f1f', '#b20710', '#e50914','#f5f5f1']

fig = px.area(filter_showtime, x='release_year', y='count', color='sentiment',
color_discrete_sequence=colors,
title='Trend of Content Released on Netflix Over Time')
fig.update_layout(
xaxis_title='Release Year',
yaxis_title='Content Produced',
legend_title='Sentiment', title_x = 0.5
)
fig.show()

How long is content on Netflix?

Looks like Netflix aims at catering to all types of audiences with runtime ranging from 20 mins to 130 mins

Netflix strives to cater to diverse audiences by offering a wide range of content with varying runtime durations. From shorter episodes or films that can be consumed in around 20 minutes to longer movies or series with runtimes up to 130 minutes, Netflix recognizes the importance of providing options that accommodate different viewer preferences and viewing habits.

IMDb and TMDb ratings for content on Netflix

When examining the IMDb scores for content on Netflix, it appears that they form a bell-shaped curve, indicating a relatively more uniform distribution of ratings. This suggests that the ratings for Netflix content tend to cluster around a central range, resulting in a smoother and more consistent pattern.

In contrast, the TMDb scores exhibit sharper consecutive highs and lows, indicating a more volatile pattern of ratings. This suggests that the ratings for content on TMDb may fluctuate more dramatically, resulting in more extreme variations from one rating to the next.

Words most likely to make it to the title …

Word cloud of Netflix content titles

Netflix has curated a wide range of content that encompasses various themes. Love, life, and Christmas are among the themes that have garnered considerable attention and have become notable aspects of Netflix’s content offerings.

Most popular actors on Netflix

Word cloud of actors on Netflix
# most popular actor on netflix
netflix_credits['name'] = netflix_credits['name'].astype(str)
name_corpus = ' '.join(netflix_credits['name'])

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image #to load our image
stopwords = set(STOPWORDS)

# Define a function to specify the text color
def netflix_color(word, font_size, position, orientation, random_state=None, **kwargs):
return "#b20710"

custom_mask = np.array(Image.open('net2.jpg.webp'))
wc = WordCloud(
stopwords = stopwords,
mask = custom_mask,height = 2000, width = 4000, color_func = netflix_color)

wc.generate(name_corpus)

plt.figure(figsize=(16,8))
plt.imshow(wc, interpolation = 'bilinear')
plt.axis('off')
plt.show()

It can be concluded that Netflix’s success in dominating the online streaming platform was largely attributed to its strategic approach in catering to its audience. It appears that Netflix closely observed Amazon’s practices and effectively learned from the mistakes that Amazon made, thereby avoiding similar pitfalls and enhancing their own growth.

In conclusion, Netflix and Amazon Prime Video are two dominant players in the streaming industry, but they have different approaches and strengths. Netflix has a global focus and offers a diverse range of long and short format content, including popular Korean dramas. While it may be experiencing a decline in its American subscriber base, Netflix effectively targets diverse audiences worldwide.

The audience preferences for movies and shows vary across regions. Countries like Canada, Brazil, the United States, France, Mexico, and Spain show balanced interest in both formats. However, Asian countries like India, Korea, and Japan exhibit more extreme tendencies, with short format content capturing significant attention in India and series being highly popular in Korea and Japan.

Netflix distinguishes itself with a wide range of genres, including a balanced mix of dramatic and comedy movies. Amazon Prime Video has a more extensive selection of horror content. In terms of mature audiences, Amazon focuses on hosting a higher number of R-rated movies, while Netflix develops a robust collection of TV shows catering to mature viewers.

The year 2015 marked a turning point for online streaming platforms, with a significant increase in the number of movies and shows available since then. Netflix’s content targeting strategy varies across regions, tailoring its selection and original productions to cater to the preferences of adult or teen audiences.

Netflix’s content library is predominantly classified as neutral, with drama and comedy genres making up a significant portion. Positive and negative content are relatively balanced. The growth of Netflix’s content library from 2014 to 2018 was fueled by the success of its original series, such as “Stranger Things” and “House of Cards.”

Netflix offers a wide range of content with varying runtime durations, recognizing the importance of catering to different viewer preferences. The IMDb scores for Netflix content form a bell-shaped curve, indicating a more uniform distribution of ratings, while the TMDb scores exhibit sharper highs and lows, suggesting more volatility in ratings.

In summary, Netflix’s global focus, diverse content offerings, and targeted approach to audience preferences contribute to its success in the streaming industry. Despite challenges in specific markets, Netflix remains a strong contender and continues to adapt its content strategy to capture and engage viewers worldwide.

The project code is available on my Github: https://github.com/shrunalisalian/Streaming-Wars

Netflix Dataset on Kaggle: https://www.kaggle.com/datasets/dgoenrique/netflix-movies-and-tv-shows

In case you enjoyed reading this article, feel free to check out articles on Amazon Prime Video, HBO Max, Disney+ , Paramount+ and AppleTV+

Feel free to let me know if you have any suggestions. Thank You for reading!

--

--