Image by Damian Zaleski (unsplash.com).

So you created the best app ever?

Here is what your users think about it!

Our workshop paper “On the Emotion of Users in App Reviews” was recently accepted for the SEmotion ’17 workshop (Second International Workshop on Emotion Awareness in Software Engineering) at the ICSE conference in Buenos Aires, Argentina. In this post I would like to report some key findings presented in the paper.

You can read the preprint of the full paper here. The paper is, amongst others, based on previous work titled “User Feedback in the AppStore: An Emperical Study” of Pagano & Maalej. A survey of app store analysis can be found here.

Hotels, cars, insurances, and apps all have one thing in common, they are subject to the ratings and reviews of their users. Who would have thought that an app, such as Instagram, could attract the reviews of 1.3+ million individual persons? Considering these reviews has a huge impact on products’ success.

Recently, I was looking for a mail application. Searching for the keyword “mail”, the AppStore returned over 500 results. The AppStore has become a highly competitive marketplace and offers users several alternative apps per use case. I decided to give my best five choices a try by installing them. Unfortunately, I noticed that each app lacks of at least one feature I was looking for.

Mail apps in the AppStore.

Browsing the apps’ reviews in the AppStore I realized that those features have also been requests by other users, in some cases already several months ago. Even though, app vendors publish app updates on a regular basis, they seem to miss the integration of features that have long been requested by users. As a result, users quit using applications and browse the AppStore for alternatives.

This fact caught my attention and I asked a colleague of mine to write a workshop paper about it together. We aimed at analyzing how the emotion within app reviews evolves if app vendors, e.g., ignore feature requests submitted by users. We therefore scraped the reviews of the top 5 free and top 5 paid apps for all 25 categories of the AppStore. We provide the apps’ reviews as input to our approach. The approach assigns each review a sentiment value between -5 (extremely negative) and +5 (extremely positive). At the end we processed 7+ million app reviews.

Plotting the sentiment in app reviews over time we found several reoccurring patterns. Apps are, e.g., controversially discussed in the reviews. These discussions require at least two groups of users with contrary positions. While the first group of users is satisfied with the application the second half provides negative feedback, such as in the reviews of the Bank of America app for iOS below.

Bank of America iOS app reviews.

Overall we found four reoccurring patterns: First, the Consistent Emotion pattern where the sentiment of users only slightly varies around a specific value. This value can either be consistently negative, neutral, or positive. Second the Inconsistent Emotion pattern. This pattern is, e.g., depicted in the Bank of America app reviews above. Third, the Emotion Drop/Jump pattern where the sentiment of users, e.g., suddenly drops due to bugs introduced or features removed in app updates. Vice versa, an emotion jump can be introduced by fixes or feature requests implemented in app updates. Last, the pattern Steady Decrease/Increase where the overall satisfaction of users slowly decreases or increases, eventually due to changes introduced in app updates.

Reoccurring patterns within app reviews (View on plot.ly).

Amongst others, we took a detailed look at the Gmail app for iOS.

Release notes of tedesigned Gmail app for iOS.

On November 7, 2016 a redesigned version of the app introducing the Material Design has been published on the AppStore. Besides the redesign of the application several features have been added. Other features have been removed or, due to the design changes, placed at different positions.

Gmail for iOS, before and after (Screenshot: macworld).

The screenshot on the left side shows the old version of the Gmail app, while the screenshot on the right side shows its redesigned version. At first glance, one can see colour changes within the application, or the repositioning of the “write email” button from the top right side to the bottom right side. At a second look, one might realize that the number of unread emails next to the title disappeared. Overall, these changes introduced a massive Emotion Drop within reviews.

While analyzing the sentiment, we especially focussed on the period shortly before and after app updates were released. We proved the correlation between the star rating and the sentiment of reviews exists. When we took a closer look at the figure below, it shows a statistics of the rating for the Gmail app shortly before and after the introduction of the redesign. The y-axis of the figure shows the number of reviews per day. The ratings from 1 to 5 stars are stacked in a bar chart (1 star = red, 5 stars = green, see legend).

Rating statistic for Gmail app before and after introducing the redesign (from AppAnnie).

While the number of reviews per day is relatively low before releasing the app update, the number of reviews multiplies afterwards with up to 600+ reviews per day. In 2016 the Gmail app received 11676 reviews in total, of which 15% are submitted in this relatively short timeframe depicted in the chart. The rating turns from a positive average into mainly negative ratings, with an average rating of 1.8 stars.

Gmail app: Reviews after major app update.

In general users complain about features that have been changed or removed in the app update. Examining the content of the reviews clusters of feature and change requests can be found. Several users, e.g., request the support for a wireless printing feature (AirPrint). Some of the users repeatedly submit their requests with every following app update released. This can not only cause an Emotion Drop but also an enduring low Consistent Emotion.

Gmail app: Users requesting AirPrint feature.
Sentiment over time of the Gmail app.

After a few updates of the Gmail app, developers reacted to the requests submitted by users. With every app update the developers added the requested features and users stopped complaining about those issues. This caused the pattern to change into a Steady Increase. Today the average rating has recovered to 3.6 stars, slightly below its previous average rating of 3.7 stars.

What app vendors can learn from this research is that by systematical analyses of historical data, such as app rating and reviews, of similar apps, a guide can be created of how app updates should be released and the amount of changes these releases should introduce.

If you are interested in reading all of our findings, read the preprint of the full paper here (arXiv:1703.02256).

For updates on our research visit our research group website and follow @mrtnsd (Website) and @TimoJay (Website) on Twitter.