Streaming Wars with Sentiment Analysis using Roberta model: Disney+

Shrunali Suresh Salian
10 min readJun 6, 2023

--

Disney+ is a popular streaming service that offers a vast collection of content from various franchises and properties owned by The Walt Disney Company. Launched in November 2019, Disney+ quickly gained traction and became a significant player in the streaming industry.

One of the key highlights of Disney+ is its extensive library of classic Disney films, including timeless animated classics like “The Lion King,” “Beauty and the Beast,” and “Cinderella.” The platform also features a wide range of content from other well-known brands and franchises, such as Marvel, Star Wars, Pixar, and National Geographic.

The article is based on the Kaggle dataset available for Disney + . It’s important to note that the analysis and conclusions based on the dataset for Disney+ may not reflect the current trends and offerings of Disney+. The dataset provides insights based on a specific time period and may not capture the most up-to-date information or changes in the streaming platform.

Disney movies or shows? Hahaha…It’s a tough choice …

For approximately every 2 movies there’s one show on Disney+

With a rich catalog of classic Disney films, animated features, and live-action movies, Disney+ offers a treasure trove of cinematic experiences.

In addition to movies, Disney+ also boasts an impressive collection of original series that are exclusive to the platform. These shows span various genres, including animated series, live-action dramas, docuseries, and reality shows. They provide viewers with immersive storytelling and the opportunity to delve deeper into the Disney, Marvel, Star Wars, and National Geographic universes.

count_data = showtime['type'].value_counts()

# Create a horizontal bar chart
fig = go.Figure(data=go.Bar(
y=count_data.index,
x=count_data.values,
orientation='h',
marker=dict(color=['#01147C', '#06B2FF'])
))

# Set title and axis labels
fig.update_layout(
title='Content on Disney+ ',
xaxis_title='Count',
yaxis_title='Type of Content', title_x = 0.5
)
fig.show()

Disney+ focussed on the American audience

The United States holds a significant share of the market for Disney+, accounting for approximately 91% of its user base. This dominance in the US market highlights the platform’s popularity and strong presence among American subscribers.

Following the US, the United Kingdom (GB) holds the second-largest market share for Disney+, with approximately 4.42% of the user base. The platform has also gained traction in Canada (CA), where it holds a market share of around 2.27%.

top_countries = pd.DataFrame(showtime['production_country'].value_counts()[:10])
top_countries = top_countries.reset_index()
top_countries = top_countries[top_countries['index'] != 'No Data']
top_countries = top_countries.rename(columns = {
'index' : 'production_country',
'production_country' : 'content_produced'
})

top_countries = top_countries[:5]
fig = px.pie(top_countries, values='content_produced', names='production_country',
title='Contribution of Content Produced by Top 5 Countries on Disney+')

# Change the color palette
fig.update_traces(marker=dict(colors=['#01147C','#0000FD', '#06B2FF','#b0ffff','#ffffff']))

# Set the text position and information to be displayed
fig.update_traces(textposition='inside', textinfo='percent+label')

fig.show()

Where on earth are you watching Disney’s content?

In India, approximately 86% of the content consumed on Disney+ is in the form of movies. Similarly, in Canada (CA) and the United States (US), the majority of viewers, around 82% and 72% respectively, engage with movies on the platform. In Australia (AU), movies make up about 68% of the content consumed by audiences.

However, in countries like South Korea (KR) and Japan (JP), the proportion of movie consumption is relatively lower, accounting for around 40% of the content viewed on Disney+. This suggests that audiences in these regions may have a higher preference for other types of content, such as TV shows, documentaries, or other formats offered by Disney+.

country_order = showtime['production_country'].value_counts()[:11].index
data = showtime[['type', 'production_country']].groupby('production_country')['type'].value_counts().unstack().loc[country_order]
data['sum'] = data.sum(axis=1)
data_ratio = (data.T / data['sum']).T[['MOVIE', 'SHOW']].sort_values(by='MOVIE',ascending=False)[::-1]
data_ratio = data_ratio.reset_index()
data_ratio = data_ratio[data_ratio['index'] != 'No Data']
data_ratio.rename(columns = {'index':'country_code'}, inplace = True)
data_ratio['MOVIE'] = round(data_ratio['MOVIE'], 2)
data_ratio['SHOW'] = round(data_ratio['SHOW'],2)

fig = go.Figure()

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['MOVIE'],
name='MOVIE',
orientation='h',
marker=dict(color='#01147C'),
text=(data_ratio['MOVIE'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

fig.add_trace(go.Bar(
y=data_ratio.country_code,
x=data_ratio['SHOW'],
name='SHOW',
orientation='h',
marker=dict(color='#06B2FF'),
text=(data_ratio['SHOW'] * 100).astype(str) + '%', # Add text as percentages
textposition='inside', # Set text position inside the bars
textfont=dict(color='white') # Set text color
))

# Set the layout
fig.update_layout(
title='Content Distribution by Country on Disney+',
barmode='stack',
yaxis_title='Top 10 Countries',
xaxis=dict(showticklabels=False), title_x = 0.5 # Hide the x-axis tick labels
)

fig.show()

Documentaries! Damn Disney+!

On Disney+, documentary content takes the lead as the highest category of content available on the platform. With its vast collection of informative and educational documentaries, Disney+ offers viewers a wide range of captivating non-fiction content to explore.

Following documentaries, comedy content holds a prominent position on Disney+. Another genre that Disney+ excels in is animation. With its rich history and expertise in animation, Disney has produced numerous beloved animated movies and series that have become a hallmark of the brand.

genre_distribution = pd.DataFrame(showtime.groupby('primary_genre')['type'].value_counts())
genre_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
genre_distribution['SUM'] = genre_distribution.sum(axis = 1)
genre_distribution.columns = ['primary_genre', 'MOVIE', 'SHOW', 'total']
genre_distribution = genre_distribution.sort_values('total', ascending = False)
fig1 = go.Figure()
fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#01147C'),
))

fig1.add_trace(go.Bar(
x=genre_distribution['primary_genre'],
y=genre_distribution['SHOW'],
name='SHOW',
marker=dict(color='#06B2FF'),
))

fig1.update_layout(
title='Content Distribution by Genre on Disney+',
xaxis_title='Genre',
yaxis_title='Content on Disney+',
barmode='stack'
)
fig1.show()

Disney focussed on children… What was your favorite Disney show/movie growing up?

Disney+ is renowned for being a streaming platform that specially caters to children and families. One of the notable characteristics of Disney+ is its emphasis on providing a safe and family-friendly viewing experience. As a result, the platform boasts the highest proportion of G and PG rated content among all streaming platforms.

rating_distribution = pd.DataFrame(showtime.groupby('age_certification')['type'].value_counts())
# rating_distribution = genre_distribution.unstack().reset_index().fillna(0).drop(0)
rating_distribution = rating_distribution.unstack().reset_index().fillna(0)
rating_distribution['SUM'] = rating_distribution.sum(axis = 1)
rating_distribution.columns = ['age_certification','MOVIE','SHOW','Total']
rating_distribution = rating_distribution.sort_values('Total', ascending = False)
rating_distribution = rating_distribution[rating_distribution['age_certification']!='0']
fig2 = go.Figure()

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['MOVIE'],
name='MOVIE',
marker=dict(color='#01147C'),

))

fig2.add_trace(go.Bar(
x=rating_distribution['age_certification'],
y=rating_distribution['SHOW'],
name='SHOW',
marker=dict(color='#06B2FF'),

))

fig2.update_layout(
title='Content Distribution by Age Rating Certification on Disney+',
xaxis_title='Genre',
yaxis_title='Content on Disney+',
barmode='stack', legend_title = 'Type of Content', title_x = 0.5
)

fig2.show()

How old is the content on Disney+?

Disney+ offers a rich collection of content that spans across different time periods, including a wide range of titles from the 2000s. The platform allows subscribers to access beloved classics, recent releases, and exclusive content from the Disney, Pixar, Marvel, Star Wars, and National Geographic brands.

In recent years, Disney+ has significantly increased its online streaming library, providing subscribers with an ever-expanding selection of movies and shows to enjoy. The year 2020, in particular, witnessed a notable surge in the number of movies available on the platform.

history = pd.DataFrame(showtime.groupby('release_year')['type'].value_counts())
history = history.unstack().reset_index().fillna(0)
# history['total'] = history.sum(axis = 1)
history.columns = ['release_year','MOVIE','SHOW']
history = history[(history['release_year'] >= 2000) & (history['release_year'] <= 2021)]
fig3 = go.Figure()
fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['MOVIE'],
mode='lines',
name='MOVIE',
fill='tozeroy',
line=dict(color='#01147C')
))

fig3.add_trace(go.Scatter(
x=history['release_year'],
y=history['SHOW'],
mode='lines',
name='SHOW',
fill='tozeroy',
line=dict(color='#06B2FF')
))

# Set the layout
fig3.update_layout(
title='Age of Content on Disney+',
xaxis_title='Release Year',
yaxis_title='Content on Disney+', showlegend = True, title_x =0.5
)
fig3.show()

Kids are Disney’s favorite

Disney+ has positioned itself as a streaming platform with a primary focus on catering to kids worldwide. With its vast collection of family-friendly content, Disney+ offers a safe and entertaining environment for children to explore and enjoy a wide variety of shows and movies.

demographic_data['target_ages'] = demographic_data['age_certification'].map(ratings_ages)
demographic_data = demographic_data.dropna()
demographic_data = demographic_data.groupby(['production_country', 'target_ages']).size().reset_index(name='count').sort_values('count', ascending = False)[:20]

total_count = demographic_data['count'].sum()
demographic_data['percentage'] = (demographic_data['count'] / total_count) * 100

fig = px.treemap(demographic_data, path=['production_country', 'target_ages'], values='percentage',
color='target_ages', color_discrete_sequence= ['#01147C','#0000FD', '#06B2FF','#b0ffff'])

fig.update_layout(title= "Disney+'s Country-Level Target Audience",
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

It’s a tough fight between documentaries and comedy on Disney+!

In the US market, comedy and documentary series have emerged as dominant genres on streaming platforms. These two genres have gained significant popularity among viewers, capturing their attention and becoming a preferred choice for entertainment.

demo_genre = showtime.groupby(['production_country', 'primary_genre']).size().reset_index(name='count').sort_values('count', ascending = False)[:20]
demo_genre = demo_genre[demo_genre['production_country']!='No Data']
fig = px.treemap(demo_genre, path=['production_country', 'primary_genre'], values='count',
color_discrete_sequence= ['#01147C','#0000FD', '#06B2FF','#b0ffff'])

fig.update_layout(title='Disney+ Content by Genre and Country',
margin=dict(l=20, r=20, t=40, b=20), title_x = 0.5) # Adjust the margins as needed

fig.show()

Relationship between IMDb and TMDb scores on Disney+

Despite having a larger collection of movies than shows, it is interesting to note that shows on Disney+ tend to receive higher ratings compared to movies. This observation suggests that Disney+ has been successful in curating and producing high-quality shows that resonate with viewers and captivate their interest.

fig = px.scatter(showtime, x='imdb_score', y='tmdb_score', color='type',
color_discrete_map={'MOVIE': '#01147C', 'SHOW': '#b0ffff'},
hover_data=['title'])

fig.update_layout(title='IMDb Score vs TMDB Score',
xaxis_title='IMDb Score',
yaxis_title='TMDB Score',
legend_title='Type', title_x =0.5)

fig.show()

Sentiment of Content on Disney+

Using Roberta model by Hugging Face to categorize movies as Positive, Neutral and Negative based on the description

Hugging Face’s Roberta model helps in gauging the sentiment of the content based on the description provided. Roberta excels in understanding context and language patterns and is better at sarcastic sentences, while VADER focuses on sentiment quantification.

The chart provides a quality assessment for each of the movies using the Roberta model. The model helps us in understanding what type of content is produced by Disney+.

53% of content on Disney+ is neutral, 11% is negative and 35% is positive

On Disney+, a significant portion of the content, approximately 35%, is classified as positive, which is the highest percentage among streaming platforms. This suggests that Disney+ is dedicated to providing a wide array of uplifting and feel-good content that resonates with its audience.

showtime['sentiment'] = showtime.apply(lambda row: 'Negative' if row['roberta_neg'] > 0.5 else ('Neutral' if row['roberta_neu'] > 0.5 else 'Positive'), axis=1)
sentiment_counts = showtime['sentiment'].value_counts()

# Create the donut chart trace
fig4 = go.Figure(data=[go.Pie(
labels=sentiment_counts.index,
values=sentiment_counts.values,
hole=0.5, # Set the hole parameter to create a donut chart
marker=dict(colors=['#01147C','#0000FD', '#06B2FF','#b0ffff']), # Set custom colors for the slices
textinfo='label+percent', # Display labels and percentages
textposition='inside', # Set the position of the labels inside the slice
)])

# Set the layout
fig4.update_layout(
title='Sentiment Distribution of Content on Disney+',
showlegend=True, title_x = 0.5,
# Add annotations in the center of the donut pies.
annotations=[dict(text='Disney+', x=0.50, y=0.5, font_size=15, showarrow=False)]
)
fig4.show()
There is no relationship between the sentiment of the content with the IMDb & TMDb scores

The lack of a relationship between content sentiment and IMDb/TMDb scores suggests that the audience’s perception of a piece of content goes beyond its sentiment alone. Numerous factors, such as the quality of acting, storytelling, production value, and personal preferences, can influence viewers’ ratings and reviews.

On Disney+, documentaries take the lead as the most abundant genre, followed closely by comedy and animation. These genres offer a diverse range of content that caters to various interests and age groups.

While the neutral sentiment dominates, there is also a significant presence of positive sentiment in these genres. This indicates that Disney+ offers uplifting and enjoyable documentaries, comedic series, and animated shows that resonate with viewers and leave them with a positive experience.

Additionally, though to a lesser extent, there is some content within these genres that elicits a negative sentiment. This could include documentaries that explore challenging topics or comedy shows that employ dark humor.

genre_sentiment = showtime.groupby(['primary_genre', 'sentiment']).size().reset_index(name='count')
genre_sentiment = genre_sentiment[genre_sentiment['primary_genre']!= 'No Data']
genre_sentiment = genre_sentiment.sort_values('count', ascending = False)

colors = ['#01147C','#0000FD', '#06B2FF','#b0ffff','#ffffff']

fig = px.sunburst(genre_sentiment, path=['primary_genre', 'sentiment'], values='count',
color_discrete_sequence=colors)

fig.update_layout(title='Genre vs Sentiments on Disney+', title_x = 0.5)

fig.show()

On Disney+, the highest proportion of content is categorized as G (General Audience) rated, followed by PG (Parental Guidance) and TV-PG (Parental Guidance suggested).

When examining the sentiment distribution within these rating categories, it is observed that the majority of the content falls into the neutral sentiment category. This suggests that Disney+ strives to provide content that maintains a balanced and unbiased tone, appealing to a broad audience without leaning heavily towards any particular emotional tone.

Following the neutral sentiment, there is a significant presence of positive sentiment in these rating categories. This indicates that Disney+ offers uplifting and enjoyable content that leaves viewers with a positive experience.

Any guesses for three words that are most common in Disney titles?

The inclusion of “Marvel” in the word cloud signifies the extensive collection of Marvel superhero movies and TV shows available on Disney+. From iconic characters like Iron Man, Spider-Man, and the Avengers to the Guardians of the Galaxy and X-Men, Disney+ offers a treasure trove of Marvel content for fans to enjoy.

Similarly, the prominence of “Star Wars” in the word cloud highlights the substantial catalog of Star Wars movies, TV series, and animated shows available on Disney+. Fans can delve into the epic space opera saga, exploring the adventures of characters like Luke Skywalker, Darth Vader, and Rey, among many others.

showtime['title'] = showtime['title'].astype(str)
title_corpus = ' '.join(showtime['title'])
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
stopwords = set(STOPWORDS)

# Define a function to specify the text color
def disney_color(word, font_size, position, orientation, random_state=None, **kwargs):
return "#ffffff"

custom_mask = np.array(Image.open('logo2.jpg.webp'))
wc = WordCloud(
stopwords = stopwords,
mask = custom_mask,height = 2000, width = 4000, color_func = disney_color)
#background_color = 'white',
wc.generate(title_corpus)

plt.figure(figsize=(16,8))
plt.imshow(wc, interpolation = 'bilinear')
plt.axis('off')
plt.show()

Disney+ stands out for its emphasis on high-quality original programming, featuring exclusive content that cannot be found elsewhere. Additionally, Disney+’s dedication to providing a safe and inclusive environment for viewers of all ages is reflected in its predominantly G and PG rated content, ensuring that families can enjoy the platform together.

Furthermore, the sentiment analysis reveals that Disney+ content leans towards a neutral and positive tone, promoting uplifting and enjoyable experiences for viewers. The platform’s commitment to storytelling, memorable characters, and immersive universes has made it a hub for Marvel and Star Wars enthusiasts, as indicated by their prominent presence in the word cloud of content titles.

The project code is available on my Github: https://github.com/shrunalisalian/Movie-Recommendation-System

Disney Dataset on Kaggle: https://www.kaggle.com/datasets/dgoenrique/disney-movies-and-tv-shows

Feel free to let me know if you have any suggestions. Thank You for reading!

--

--