Analyzing user reviews of leading U.S. streaming services

Inside.TechLabs
TechLabs
Published in
7 min readNov 28, 2020

This project was carried out as part of the TechLabs “Digital Shaper Program” in cooperation with the Marketing Center Münster (Term 2020/01).

Application (app) marketplaces contain a great quantity of feedback in the form of star ratings or user reviews. Users can share their opinion about apps in text reviews, where they have the opportunity to, e.g. comment on specific app attributes or demand new features. Nevertheless, these insights remain largely unused since it is challenging to analyze unstructured consumer feedback. Besides the high number of reviews, users are not restricted to preselected items or the use of rating scales as in a survey. Consumers are completely free in what they want to express. Moreover, it is a challenge to understand what the words mean since the same word can have a different meaning if it is paired with another word. Therefore, it is not sufficient to simply summarize word counts in text data since this will likely be confusing except if the analysis relates it to the other words that also occur without assuming an independent process of word choice.

In our project, we applied a natural language processing approach that would systematically analyze the content of user reviews with the help of the programming language R. To do so, we apply an LDA topic modeling as well as a sentiment analysis. This approach helped us to gain insights into the requirements, complaints, and sentiments of users towards certain apps and their features.

With our analysis, we aim to

i) identify the topics and attributes that are discussed most among users of streaming platforms and

ii) to analyze how individual streaming platforms perform compared to their competition regarding their product attributes and based on their user reviews.

The findings of those analyses could serve as a resource for managers of those streaming services, determine the potential strengths and weaknesses of their product, identify user needs, and compare their respective services with the competition.

The dataset

We decided to acquire the data by ourselves from the product pages of the respective streaming platforms in the US Google Play store. We used the python application google-play-scraper to scrape the reviews. This app offers an API to retrieve all information on user reviews of the Google Play store through the function reviews. Finally, we acquired the 1000 most recent as well as the 1000 best-rated reviews respectively for each streaming provider. Our final dataset includes the five major streaming companies “Netflix”, “Amazon Prime”, “Disney PLUS”, “HBO now” and “HULU”. After removing duplicates it comprised 17,419 individual user reviews and star ratings.

LDA Topic modeling

After having cleaned the data, we performed the core method of our analysis, the LDA topic modeling. The LDA algorithm is based on the assumption that there is a defined number of latent topics appearing in several documents within a corpus of texts. Each document of that corpus consists of a mixture of topics, and each topic consists of a discrete probability distribution over words. This means that the probability of the occurrence of a word in a document depends on the existence of a latent topic in that document. As a result, an LDA model collects words together that show a high probability to represent a topic, making the co-occurrence of words in a document an indication for the presence of a latent topic.

Performing the LDA in R, we use the FitLdaModel function from the textmineR package.

As an input for the function, we need to create a document term matrix which is basically depicting which terms appear how often in which document (in this context reviews). In our case, the terms are bigrams. This means we are looking at the frequency of two words appearing next to each other in the reviews.

Furthermore, we need to define how many topics are to be created by the model. To make an objective decision on the number of topics, we calculate the coherence score for a k number of topics between 1 and 25. The coherence score assesses the quality of the learned topics and, therefore, indicates the goodness of a model. As the number of 15 topics provided a high coherence score as well as interpretable results, we opted for that model. A summary of all topics as a result of the LDA algorithm is presented in the following figure:

Summary of Retrieved Topics

Evaluation of topics

After having extracted rather distinguishable topics, we aim to evaluate those topics, also based on each individual app.

In the next step, we conduct a sentiment analysis to retrieve the notion of users in the reviews.

With the help of the SentimentAnalysis package in R, we can assign a sentiment between -1 (most negative) and 1 (most positive). As a second metric to evaluate the topics, we used the star ratings (between 1 and 5) provided with each review. As the sentiment and the rating are only provided per review, we needed to link individual reviews with the topics. We did so by considering the likelihood of the LDA modeling which indicates with which likelihood a review belongs to a topic. We decide to assign each review that belongs to a topic with a likelihood larger than 0.44 to that respective topic. Having also the information for which app a review was written for we could, furthermore, determine how many reviews of a topic belong to which streaming provider.

Based on those three variables, we were able to identify differences between different topics and across streaming apps. With the help of statistical tests, we further analyzed whether those differences are significant. To compare the sentiment across topics and apps we conducted ANOVAs, for the comparison of the rating, we applied a more robust Kruskal-Wallis test and a posthoc pairwise Wilcoxon test since the data about the rating are not normally distributed.

Summary of the results

To not exceed the limit and the scope of this post, we only provide the results of our analysis in an aggregated form as well as the managerial implications of those findings.

The most discussed topics among users comprise a variety of different subjects regarding specific problems with the streaming providers’ apps as well as the content, streaming quality, user interface, payment, and functionality of the apps.

Comparing the streaming providers, it stands out that HBO customers are unsatisfied due to login and account issues following the migration from HBO now to HBO max. Consequently, the new platform did not work on every device. An implication for HBO would be to make the newly branded platform as soon as possible available on the same devices as before or improve their communication of changes and limitations regarding the new app in advance.

This also leads to the next issue of HBO which is related to the service experience of customers. Apparently, users feel partly left alone when they face problems with the app. It is strongly recommended to focus on customer support from an HBO perspective to avoid unnecessary dissatisfaction and negative reviews, which might also affect possible new customers.

Moreover, HBO but also Netflix seem to have problems with the app itself since users indicated that the apps crash or do not work on their devices. A short-term measure for these providers should be to offer better customer service or instructions on how to fix these issues. In the long run, these bugs should be fixed by developers before releasing a new update.

When discussing other problems and errors, users often talk about problems with the playback of the video and audio among all providers. Even though no company stands out, streaming providers should monitor these types of issues regularly.

Concerning content topics, all providers come off with acceptable results. There is still room for improvement in the content offering, but it is not possible to make a clear recommendation for specific listings, as the desire for content is very individual. However, since HBO scored significantly worse than the majority of its competitors in terms of the specific offer, it is most worthwhile for this provider to analyze the customer wishes more closely. The greatest potential for improvement for Amazon Prime stands out in the content search option, where the provider appeared unexpectedly often and one third of their users stated dissatisfaction.

With regard to the average rating performance in episode navigation, all of the companies should revise their user interface and make it easier to find individual episodes and switch between them.

Compared to competitors, HULU is the only provider offering ad-supported subscription plans. Even though ad-free plans are offered, users complain about the advertisement. Therefore, the management might consider rearranging HULU´s subscription packages. Another solution is to focus on improved communication regarding the different subscription plans and their individual benefits or conditions.

Overall, it can be observed that the largest streaming providers in the U.S. industry (Netflix, Amazon Prime, Disney PLUS) are evaluated better than the services of HBO and HULU. Given the topic and evaluation outcomes, the two providers might use these insights to improve their offering and fix issues faster than the competition to gain further user satisfaction.

Conclusion

With the identification and evaluation of latent topics, we were able to identify the strengths and weaknesses of five leading streaming providers indicated through reviews of Google Play users. This project was conducted within the course “Data Science” offered by the Marketing Center Münster in cooperation with TechLabs. Next to working on questions related to streaming platforms, we can conclude that learning and applying the programming language R was challenging, yet very fun. We would like to thank the whole TechLabs team for enabling this experience and the guidance throughout the project as well as the faculty of the Marketing Center Münster and especially Lisa Richter for her great support during the whole track.

The Team:

Daniel Dirksen (LinkedIn)

Philipp Gilgen (LinkedIn)

Nele Kollenberg (LinkedIn)

Marcel Kraft (LinkedIn)

For further questions feel free to get in touch with us!

--

--

Inside.TechLabs
TechLabs

Our community Members share their insights into the TechLabs Experience