Online Reviews Under Shock
How do the reviews for a product change (in volume and in valence) when that product goes on sale? How about when that product starts a new large marketing campaign?
Here’s a simple example. Season 3 of Netflix’ hit show House of Cards premiered on February 27th, 2015. In the 5th episode, the main character (Frank Underwood) is prominently shown to be playing Monument Valley on his iPad. Now, Apple doesn’t release any sales numbers so the best we can do is to use the number of reviews to approximate the sales of an app (note that only users that purchased an app can write a review). Here’s what happened to the volume of reviews for Monument Valley in the first 5 days following the House of Cards season 3 release, compared to the 5 immediately before.

From an average of 8.4 reviews per day (already very high compared to the overall apps in the store), we see an increase of more than 450% to an average of 46 per day.
But here’s what happened to the ratings accompanying the reviews.

That’s a 12% decrease (4.1 average to 3.6) in just 5 days. Something similar, in a much larger extent, has already been observed: Work from researchers from Boston University and Harvard University has shown that establishments that offer a Groupon see their Yelp ratings suddenly decline.
What happened is that the population that uses the app, and hence the population that writes a review suddenly changed.
As part of my PhD work, I studied 4 large scale promotions of iOS apps to understand how each affected the sales and ratings of the promoted apps. In other words how each type of shock affected the product’s reviews. I’m going to briefly discuss two of them here: Apple’s weekly `Free App of the Week’ and Starbucks’ weekly ‘Pick of the week’.
The biggest shock is caused by Apple’s ‘Free App of the Week’ promotion. This is a promotion that each week features one app and offers it for free to all iOS users. Moreover, there is a prominent banner on the front page of the AppStore. Hence users have little incentive not to download the app. This is evident in the fact that during the week of the promotion the number of reviews increased by a staggering 3700% on average. Let’s think about this new population for a second and the benefits and risks it may carry. On one hand, by making it so easy for a user to get a product (literally putting it one tap away) can attract users outside the app’s target demographic: imagine giving away `Angry birds’ to a user that doesn’t particularly like casual games. This approach inherently carries the risk of lower ratings from this new population. On the other hand though, users that are exposed to the promotion banner are users that are already browsing the AppStore and hence they are most likely on the look for a new app. Seeing the app is endorsed by Apple doesn’t hurt either. Indeed, we see this positive change in the ratings of the promoted apps as well, with the average rating increasing by 13%.
The second promotion I study is same in spirit with the above but different in distribution. It’s run by the coffee chain Starbucks, and it’s a weekly offering of a featured app for free, exactly as Apple’s ‘Free App of the Week’ above; it even has a similar name, ‘Pick of the week’, and basically the same selection process for the featured apps (this was told to me by one of the people running the promotion for Starbucks). The crucial difference is in the fact that Starbucks offers a printed coupon in their physical stores, which users must get on their hands and follow an easy by not-as-trivial-as-one-tap redemption procedure. And this is reflected in the effects of the shock on the reviews: there is large increase on the volume of reviews, by 700%, but virtually no change in the ratings. Why is that? One hypothesis is that it’s due to filtering. With the redemption being not trivial, what types of users are more likely to go through with it? Users that are already positively predisposed towards the app. Since the apps featured in the promotion were not free, there was filtering before the promotion as well (who is more likely to pay 1, 2 or 3$ for a specific app? users that are positively pre-disposed to it) hence the high and stable rating before and after the promotion.
Here are the plots for the volume and ratings for each of the two promotions.


Another problem that I’ve been working on for my PhD thesis involves understanding how can a review platform solicit and provide as credible and representative reviews as possible. Through my involvement with the Spiegel Research Center and because of their partnership with PowerReviews (a company that provides review platforms for online retailers) I got access to a very unique dataset: the entire review history of 4 large online retailers, who between them sell a wide variety of electronics, home, kitchen, jewelry, fashion, health and personal care products. A fascinating characteristic of the platform that PowerReviews provides is that it allows users to provide self-motivated reviews (i.e., much like Amazon and Yelp, a user can go to the website of the retailer and do all the necessary steps to submit a review for a product) as well as it send email promptings asking purchasers to submit a review for the products they bought (much like Airbnb and Uber do. Of course in the case of Uber it’s not an email but a prompt within the app, but same idea). This creates a natural classification of the reviews: self-motivated reviews (we will also refer to them as web-reviews) and prompted reviews (we will also refer to them as email-reviews). Part of my work with the dataset was to understand the differences between the two sets of reviews and very roughly two of the main findings are:
- Email reviews carry substantially higher ratings on average, and are less bi-modular.
- Email reviews are very stable over time, whereas web reviews exhibit a downward trend.
I’m not going to expand on these differences in this post, instead I want to expand on the shock. For some retailers, prompting reviews is a recent phenomenon, i.e., for some years they had only self-motivated reviews. We already established that the two sets of reviews (and hence the two sets of populations that write them) are substantially different. Hence, the question I was interested in is this: What was the effect of the introduction of the review promptings on the entire review ecosystem of the retailer as well as on the population that writes self-motivated reviews. More specifically,
- Was the prompting email simply a redirection for users that would have written a review anyway? In other words, were these positive ratings already in the system and we just redirected them?
- Did the self-motivated reviews change with the introduction of this new set of higher ratings? One can imagine users that are dissatisfied exaggerating even more their dissatisfaction to balance out the higher ratings.
In order to tackle the questions we found the date that email reviews started arriving in the system and we followed a natural experiment approach by comparing things for a period of time (3 or 6 months) before and after the `treatment’. To control for natural temporal trends we used data from previous years, where no `treatment’ was applied.
We found that the introduction of the email promptings had no effect on the ratings or the volume of self-motivated reviews, i.e., the promptings tapped into an entirely new segment of the purchasing population without affecting at all the one that was already submitting reviews. This is important since it means that the reviews overall became more representative, and since the new set of reviews come only from verified purchasers, the reviews became also more credible as a whole. A positive side-effect for the retailers, is that since email-prompted reviews are on average more positive than self-motivated ones, the reviews overall became more positive. So really, a win-win situation.
The common theme on both of the lines of research I just described is that they deal with the effects of a shock in the population of users that submits a review. Sometimes that shock has positive effects (as we saw with the increase in star-rating for the apps featured in Apple’s promotion), sometimes neutral (as with the ratings for the apps in Starbucks’ promotion) and sometimes negative (as with the ratings for Monument Valley after it was featured in House of Cards).
Such shocks can be potentially found in a variety of settings: from books that are featured on reading-lists of prominent celebrities to albums of musicians that appear on SNL. They can also be found in potential policy changes: what would happen if Uber decided to make the app unusable until you submit a review for your last ride? How about until you submit a review for the Uber app itself? Or if Amazon sent you an email-prompting for every product you purchased? With the introduction of iOS 4 back in 2010, Apple removed a feature that prompted users to review an app when they deleted it. Developers complained that users that enjoy the app (and hence don’t delete it) are less likely to write a review. It’s hard to imagine this policy change didn’t cause a shock on the entire review ecosystem of the store.
Marketing professionals designing promotions for a product or product managers making decisions on the design and policies of a review platform should consider what types of shocks their actions will cause.
To close with and perhaps to make the situation even more complicated, the shock can be something external. Consider the ‘customers also bought’ recommendation section that exists in the pages of many apps in Apple’s AppStore (or similar sections on the pages of products on Amazon and other retailers). Now imagine that an app, lets say Angry Birds, is featured on a large promotion. With more spotlight (i.e., visits) on the page of Angry Birds, comes more spotlight on the pages of these recommended apps. In other words, products can experience shocks during a promotion even if they are not the ones featured in the promotion. But more on this in a later post.
Yorgos Askalidis is a PhD student at Northwestern University graduating Summer ’16. Please don’t tell his thesis committee members that he is writing blog posts instead of his thesis.