Reputation in Online Markets

Reputation systems have become an integral part of nearly all online markets, including eBay, Airbnb, TaskRabbit, Lyft, Uber, and many others. Users rely on these ratings to choose which products to buy and how much to pay, and platforms use these ratings to identify bad actors. In most online markets, these systems are the only way users can predict quality of experience and distinguish between sellers, and some markets prioritize highly rated sellers in search results. A high reputation translates to more sales, and a well-tuned system incentivizes good behavior. Recent work, however, has shown that that standard is not always met. Reputation systems are plagued with inflated reputations, with the problem worsening as a platform matures, and online markets have suffered as a result. Platforms have tackled this problem through various mechanisms with limited success, with the industry still waiting on a comprehensive solution.

Problems in Reputation

That reputation systems can inflate is well-documented across online markets. On eBay, for example, more than 90% of sellers studied between 2011 and 2014 had a rating of at least 98%, and more transactions resulted in a dispute than in a negative rating. On one large labor market over seven years, average ratings rose by one star, and, on AirBnB, buyers often rate a seller highly publicly but then privately indicate they would not recommend the same seller. These distortions can significantly affect the market, reducing the reputation system’s benefit. New buyers may not know what constitutes a high rating and then may leave the platform after a bad experience, and sellers may conclude that providing good service is not worth it. Fixing the problem, however, first requires understanding how it forms.

Ratings on online platforms inflate due to a simple cause: buyers are less likely to leave a negative review than a positive review and may even disingenuously give a positive review after a negative experience. There are two overarching reasons that buyers behave in such a manner.

First, they may fear retaliation from a vindictive seller, either through an expensive lawsuit or more subtly through the platform itself. Early versions of both Airbnb’s and eBay’s review systems allowed each party in a transaction to rate the other, with reviews published immediately. As a result, one party would often reciprocate a negative review with their own negative review. Waiting to publish reviews until both parties submit one helps, but does not solve the problem; sellers may still otherwise harass buyers to change their negative reviews after submission. Such retaliation, or even the threat of it, is often enough to prevent negative public reviews.

Unfortunately, bad-faith retaliation is not the only type of pressure buyers face. Buyers may simply not want to criticize others publicly. This pressure increases if the two parties had communicated extensively during the transaction, forming a social bond. Researchers have also found that when sellers incentivize reviews (of any kind) by offering discounts, they create an implicit social obligation for reviewers to reciprocate with a positive review. This type of pressure is harder to fix, as there may be no indication that it is even occurring.

Existing Solutions

Platforms are aware of the inflation problem and have invested in fixing it. Most existing solutions try to decrease retaliatory pressure from sellers. eBay in 2007 implemented Detailed Seller Ratings in which buyers rate sellers, with ratings only presented in the aggregate so as to prevent pressure. The platform later eliminated a seller’s ability to negatively rate buyers. Other markets have also opted for anonymizing and aggregating ratings, resulting in a system in which sellers cannot trace a bad rating back to any individual buyer. One online labor market experimented with such private ratings and found that it better predicts both future private feedback and future public feedback. These solutions have reduced but not eliminated reputation inflation.

One concern with these solutions is that they may not be robust: sellers can find ways other than retaliation to incentivize positive reviews, and private measures are effective only as long as sellers don’t know they are being used— otherwise they can optimize for that measure through pressure. Furthermore, they do not solve the second cause for bias — that people may not want to criticize others online, especially if they see no personal benefit.

A Look Ahead

This challenge requires more research into new reputation systems. One promising approach is to change the incentives of those submitting reviews. Researchers on the academic crowdsourcing platform Daemo implemented Boomerang, in which an employer’s positively reviewed workers are more likely to work on the employer’s future tasks, and thus the employer internalizes the cost of an inaccurate review. It’s not clear how this solution scales with the size of the market, or how other platforms can integrate similar ideas. However, their solution does tackle the problem that reviewers may not want to criticize others, by making it expensive for them to not label poor-performing workers as such.

One potential solution we’re studying is a pairwise comparative system, in which reviewers are asked to compare an experience with one of their previous experiences. To estimate a global ranking from the pairwise comparisons, we’ll model how two different experiences’ ratings would translate to the probability of a buyer saying one experience is better than the other, taking into account some of the discussed biases. Pairwise comparisons can then be used as samples from which global ratings can be estimated. Similar systems are used to rank chess players (ELO) and players on the Xbox (Microsoft’s TrueSkill). We believe that a pairwise comparative system would allow people to overcome their reluctance to criticize. After all, they wouldn’t be saying an experience was bad, just that a different one was better.

An ideal reputation system would seamlessly transfer across markets and grow with the platform, without inflating or becoming biased over time as the market composition changes. Whether a pairwise comparative system would meet that bar remains to be seen. What is clear, however, is that no existing solution does. So, until told otherwise, be careful relying on reputation online.