Streaming Wars with Sentiment Analysis using Roberta model: Paramount+

Shrunali Suresh Salian
10 min readJun 6, 2023

--

Paramount+ is a prominent streaming platform that offers a diverse range of content to its subscribers. Launched by ViacomCBS, Paramount+ provides access to a vast library of movies, TV shows, and original programming from various networks and studios under the ViacomCBS umbrella.

One of the standout features of Paramount+ is its extensive collection of beloved franchises and iconic properties. Subscribers can enjoy a wide range of content from popular networks such as CBS, MTV, Nickelodeon, Comedy Central, and BET, among others. This includes a rich selection of classic TV shows, current series, and exclusive content that spans multiple genres and interests.

The article is based on the Kaggle dataset available for Paramount+ . It’s important to note that the analysis and conclusions based on the dataset for Paramount+ may not reflect the current trends and offerings of Paramount+. The dataset provides insights based on a specific time period and may not capture the most up-to-date information or changes in the streaming platform.

What do you like watching? Movies or TV Shows?

For every four movies there is one tv show on Paramount+

This distribution strategy allows subscribers to enjoy a diverse mix of both feature-length films and episodic series, catering to different viewing preferences on Paramount+.

count_data = showtime['type'].value_counts()

# Create a horizontal bar chart
fig = go.Figure(data=go.Bar(
y=count_data.index,
x=count_data.values,
orientation='h',
marker=dict(color=['#0c0c0c', '#11518a'])
))

# Set title and axis labels
fig.update_layout(
title='Content on Paramount+ ',
xaxis_title='Count',
yaxis_title='Type of Content', title_x = 0.5
)
fig.show()

Who is dominating the Paramount+’s market share?

US dominates 88% of Paramount+ ‘s market share

The dominance of the US market share is not unexpected, considering the origin of Paramount+ and its affiliation with ViacomCBS, a prominent American media conglomerate. However, the platform’s expansion and success in international markets, such as the UK and Canada, indicate its ability to captivate viewers beyond its home market.

top_countries = pd.DataFrame(showtime['production_country'].value_counts()[:10])
top_countries = top_countries.reset_index()
top_countries = top_countries[top_countries['index'] != 'No Data']

top_countries = top_countries[:5]
fig = px.pie(top_countries, values='content_produced', names='production_country',
title='Contribution of Content Produced by Top 5 Countries on Paramount+')

# Change the color palette
fig.update_traces(marker=dict(colors=['#0c0c0c','#11518a', '#226ca1','#9fa8ab','#f1f1f1']))

# Set the text position and information to be displayed
fig.update_traces(textposition='inside', textinfo='percent+label')

fig.show()

Do we really like movies that much, or is it just a myth?

When it comes to the distribution of movies and shows on Paramount+ across different countries, there are some notable trends. In Italy (IT), a significant majority of viewers, approximately 95%, prefer watching movies on the platform. This suggests a strong preference for cinematic experiences among Italian subscribers.

Similarly, in countries such as Germany (DE), South Africa (ZA), France (FR), Australia (AU), Spain (ES), the United Kingdom (GB), and the United States (US), more than 80% of viewers lean towards watching movies on Paramount+. This indicates a widespread interest in film content across these regions.

fig = go.Figure()

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['MOVIE'],
name='MOVIE',
orientation='h',
marker=dict(color='#11518a'),
text=(data_ratio['MOVIE'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['SHOW'],
name='SHOW',
orientation='h',
marker=dict(color='#9fa8ab'),
text=(data_ratio['SHOW'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

# Set the layout
fig.update_layout(
title='Content Distribution by Country on Paramount+',
barmode='stack',
yaxis_title='Top 10 Countries',
xaxis=dict(showticklabels=False), title_x = 0.5 # Hide the x-axis tick labels
)

fig.show()

What kind of a person are you: Drama or Comedy?

Most popular on Paramount+ : Drama, Comedy, and Documentary

When it comes to the most popular genres on Paramount+, there are three categories that stand out: Drama, Comedy, and Documentary. These genres have garnered significant attention and viewership on the platform, reflecting the preferences of Paramount+ subscribers.

genre_distribution = pd.DataFrame(showtime.groupby('primary_genre')['type'].value_counts())
genre_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
genre_distribution['SUM'] = genre_distribution.sum(axis = 1)
genre_distribution.columns = ['primary_genre', 'MOVIE', 'SHOW', 'total']
genre_distribution = genre_distribution.sort_values('total', ascending = False)
fig1 = go.Figure()
fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#11518a'),
))

fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['SHOW'],
name='SHOW',
marker=dict(color='#9fa8ab'),
))

fig1.update_layout(
title='Content Distribution by Genre on Paramount+',
xaxis_title='Genre',
yaxis_title='Content on Paramount+',
barmode='stack'
)
fig1.show()

For mature audiences only …

R rated movies are the highest on Paramount+

By offering a substantial selection of R-rated movies, Paramount+ caters to the preferences of adult viewers who enjoy edgier, more intense, and provocative storytelling. These movies often explore complex narratives, delve into gritty realism, or push boundaries in terms of content and subject matter.

rating_distribution = pd.DataFrame(showtime.groupby('age_certification')['type'].value_counts())
# rating_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
rating_distribution = rating_distribution.unstack().reset_index().fillna(0)
rating_distribution['SUM'] = rating_distribution.sum(axis = 1)
rating_distribution.columns = ['age_certification','MOVIE','SHOW','Total']
rating_distribution = rating_distribution.sort_values('Total', ascending = False)
rating_distribution = rating_distribution[rating_distribution['age_certification']!='0']
fig2 = go.Figure()

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#11518a'),

))

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['SHOW'],
name='SHOW',
marker=dict(color='#9fa8ab'),

))

fig2.update_layout(
title='Content Distribution by Age Rating Certification on Paramount+',
xaxis_title='Genre',
yaxis_title='Content on Paramount+',
barmode='stack', legend_title = 'Type of Content', title_x = 0.5
)

fig2.show()

How old is the content that you are watching?

2018 was the year when the highest number of movies were added to the streaming platform

The year 2018 marked a significant milestone for Paramount+, as it witnessed the highest influx of movies being added to the streaming platform. During this period, Paramount+ made a notable effort to expand its content library by introducing a plethora of new films, providing viewers with a diverse range of options to enjoy.

history = pd.DataFrame(showtime.groupby('release_year')['type'].value_counts())
history = history.unstack().reset_index().fillna(0)
# history['total'] = history.sum(axis = 1)
history.columns = ['release_year','MOVIE','SHOW']
history = history[(history['release_year'] >= 2000) & (history['release_year'] <= 2021)]
fig3 = go.Figure()
fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['MOVIE'],
mode='lines',
name='MOVIE',
fill='tozeroy',
line=dict(color='#11518a')
))

fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['SHOW'],
mode='lines',
name='SHOW',
fill='tozeroy',
line=dict(color='#9fa8ab')
))

# Set the layout
fig3.update_layout(
title="How old is Paramount+'s Content",
xaxis_title='Release Year',
yaxis_title='Content on Paramount+', showlegend = True, title_x =0.5
)
fig3.show()

Who is Paramount+ targeting in your country?

Paramount+’s content specifically targets adults in the US

Paramount+ has strategically positioned itself as a streaming platform that primarily caters to adult audiences in the United States. With a focus on delivering compelling and engaging content tailored to the preferences of this demographic, Paramount+ aims to captivate and entertain adult viewers seeking high-quality programming.

total_count = demographic_data['count'].sum()
demographic_data['percentage'] = (demographic_data['count'] / total_count) * 100

fig = px.treemap(demographic_data, path=['production_country', 'target_ages'], values='percentage',
color='target_ages', color_discrete_sequence= ['#0c0c0c','#11518a', '#226ca1','#9fa8ab','#f1f1f1'])

fig.update_layout(title= "Disney+'s Country-Level Target Audience",
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

What are you watching on Paramount+?

It’s Drama vs Comedy in the US

The success of drama and comedy on Paramount+ highlights the platform’s ability to curate compelling and entertaining content that resonates with its audience. By consistently delivering high-quality programming in these genres, Paramount+ has established itself as a go-to destination for drama and comedy enthusiasts in the United States.

We might watch more movies but we like shows better…

Relationship between IMDb and TMDb scores for content on Paramount+

Paramount+’s TV shows are rated higher than movies on both platforms IMDb and TMDb. This indicates that the platform’s TV series are resonating more with audiences and garnering positive feedback.

fig = px.scatter(showtime, x='imdb_score', y='tmdb_score', color='type',
color_discrete_map={'MOVIE': '#11518a', 'SHOW': '#9fa8ab'},
hover_data=['title'])

fig.update_layout(title='IMDb Score vs TMDB Score',
xaxis_title='IMDb Score',
yaxis_title='TMDB Score',
legend_title='Type', title_x =0.5)

fig.show()

Is it all a mind game 🤨 ?

Using Roberta model by Hugging Face to categorize movies as Positive, Neutral and Negative based on the content description

Hugging Face’s Roberta model helps in gauging the sentiment of the content based on the description provided. Roberta excels in understanding context and language patterns and is better at sarcastic sentences, while VADER focuses on sentiment quantification.

The chart provides a quality assessment for each of the movies using the Roberta model. The model helps us in understanding what type of content is produced by Paramount+.

What’s with all the fuss: Positive, Negative and Neutral

48% of content on Paramount+ is neutral, 29% is negative and 22% is positive

This distribution of sentiment suggests a mixed range of emotions and tones in the content available on the platform. The significant portion of neutral content indicates a balance between positive and negative elements, where the overall sentiment may not strongly lean towards one direction.

The presence of negative content indicates that Paramount+ offers a range of stories that explore challenging themes, conflicts, and character arcs that may evoke emotional responses. These negative elements can contribute to the overall depth and complexity of the storytelling, providing viewers with thought-provoking and compelling narratives.

On the other hand, the positive content signifies moments of inspiration, joy, and uplifting storytelling that resonate with audiences. These elements can bring entertainment, optimism, and a sense of satisfaction to viewers, creating a well-rounded viewing experience.

sentiment_counts = showtime['sentiment'].value_counts()

# Create the donut chart trace
fig4 = go.Figure(data=[go.Pie(
labels=sentiment_counts.index,
values=sentiment_counts.values,
hole=0.5, # Set the hole parameter to create a donut chart
marker=dict(colors=['#0c0c0c','#11518a', '#226ca1','#9fa8ab','#f1f1f1']), # Set custom colors for the slices
textinfo='label+percent', # Display labels and percentages
textposition='inside', # Set the position of the labels inside the slice
)])

# Set the layout
fig4.update_layout(
title='Sentiment Distribution of Content on Paramount+',
showlegend=True, title_x = 0.5,
# Add annotations in the center of the donut pies.
annotations=[dict(text='Paramount+', x=0.50, y=0.5, font_size=15, showarrow=False)]
)
fig4.show()

Relationship between IMDb and TMDb scores of content on Paramount+ categorized according to the content sentiment

There is no relationship between the sentiment of the content with the IMDb & TMDb scores

The lack of a relationship between content sentiment and IMDb/TMDb scores suggests that the audience’s perception of a piece of content goes beyond its sentiment alone. Numerous factors, such as the quality of acting, storytelling, production value, and personal preferences, can influence viewers’ ratings and reviews.

What type of content are you watching?

Drama, comedy, documentaries, and thrillers collectively make up almost 60% of Paramount+’s content inventory. Among these genres, neutral sentiment tends to dominate, indicating a balanced portrayal of themes and storytelling elements that evoke a range of emotions.

However, it’s noteworthy that within the drama genre, there is a relatively higher proportion of content classified with negative sentiment. This suggests that Paramount+ offers a diverse range of dramatic narratives that delve into complex and challenging themes, potentially exploring conflicts, hardships, and intense character arcs. This darker or more intense content may resonate with viewers who appreciate thought-provoking and emotionally engaging storytelling.

On the other hand, comedy, documentaries, and other genres within the 60% inventory are more inclined towards neutral sentiment. This suggests a focus on content that appeals to a broader audience, offering entertainment, informative storytelling, and relatable narratives that may not lean heavily towards positive or negative emotions.

genre_sentiment = showtime.groupby(['primary_genre', 'sentiment']).size().reset_index(name='count')
genre_sentiment = genre_sentiment[genre_sentiment['primary_genre']!= 'No Data']
genre_sentiment = genre_sentiment.sort_values('count', ascending = False)
colors = ['#0c0c0c','#11518a', '#226ca1','#9fa8ab','#f1f1f1']
fig = px.sunburst(genre_sentiment, path=['primary_genre', 'sentiment'], values='count',
color_discrete_sequence=colors)

fig.update_layout(title='Genre vs Sentiments on Paramount+', title_x = 0.5)

fig.show()

Keep your ratings and sentiments in check…

On Paramount+, R-rated content holds the highest proportion, and it is associated with the highest negative sentiment among the different content ratings. This indicates that Paramount+ offers a selection of content that may contain mature themes, explicit language, or graphic content, which can evoke more intense and potentially negative emotions in viewers.

Words mostly likely to make it to the title on Paramount+

Word cloud of content titles on Paramount+

When analyzing the titles of content on Paramount+, some of the most common words that emerge are “Love,” “Wild,” and “Story.” These words indicate that Paramount+ emphasizes content that revolves around themes of romance, adventure, and compelling narratives.

showtime['title'] = showtime['title'].astype(str)
title_corpus = ' '.join(showtime['title'])
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
stopwords = set(STOPWORDS)

# Define a function to specify the text color
def paramount_color(word, font_size, position, orientation, random_state=None, **kwargs):
return "#f1f1f1"

custom_mask = np.array(Image.open('paramount-plus.png'))
wc = WordCloud(
stopwords = stopwords,
mask = custom_mask,height = 2000, width = 4000, color_func = paramount_color)
#background_color = 'white',
wc.generate(title_corpus)

plt.figure(figsize=(16,8))
plt.imshow(wc, interpolation = 'bilinear')
plt.axis('off')
plt.show()

Paramount+ stands as a competitive player in the streaming industry, offering a diverse range of content to cater to various audience preferences. With a focus on drama, comedy, documentaries, and thrillers, Paramount+ aims to provide engaging and entertaining experiences for its viewers.

The platform’s emphasis on adult-oriented content, particularly in the United States, showcases its intent to captivate a mature audience. However, Paramount+ also recognizes the importance of catering to younger viewers, with offerings that appeal to teens and older children.

The abundance of R-rated movies on Paramount+ highlights its commitment to providing edgier and more mature content, catering to viewers seeking a heightened level of intensity and storytelling.

The project code is available on my Github: https://github.com/shrunalisalian/Movie-Recommendation-System

Paramount+ Dataset on Kaggle: https://www.kaggle.com/datasets/dgoenrique/paramount-movies-and-tv-shows

Feel free to let me know if you have any suggestions. Thank You for reading!

--

--