4.6. That’s my Uber rating. About average in the U.K. (slightly poor in the U.S.). How does this make drivers react when they see me pop up requesting a ride? How should it make me feel when I check my app and my rating has crept up… or down? What does it mean for Uber when I rate one of their drivers after a trip?
Star ratings are a central design feature at the core of many digital products. They are relied on to ensure quality, drive decision-making, provide feedback and much more. They also directly impact how we interact with each other, particularly with the phenomenal rise of the platform and peer-to-peer business models, where the rating center of power shifts from a small number of experts (e.g., movie critic reviews) to the masses. They influence the behavior of humans both at the giving and receiving ends of the process. They form an integral data set upon which organizations, from lone-wolf startups to trillion-dollar companies, base critical business, and design decisions. And—in the monumental quest for user convenience—they are determined within a split second and sealed with a single click.
Apart from Uber, whose rating system seems to have somehow achieved a sort of cult status holy grail, we encounter five-star rating systems every day across many walks of life: retail (e.g,. Amazon), leisure (e.g., Vivino), hospitality (e.g., Airbnb), technology (e.g., app stores), etc. In addition to this, the five-star rating has become an industry in itself. TripAdvisor relies solely on this feature, with peer reviews and recommendations at the heart of its model. Trustpilot offers users a measure of authenticity and aims to build trust between an organization and its potential customers.
What happens to our judgment faculty when we’re conditioned almost daily to base multiple snap decisions on a scale of one to five? Is this an optimal way to provide our feedback without interrupting our lives, or is it just an opportunity for organizations to exploit this feature for their own benefit? Who should govern this and how?
Organizations should self-govern how they use a five-star rating feature, and for this, the designer needs to play a key role.
Five-star ratings have many benefits. They’re convenient, they’re quantitative, and their simplicity and visual identity means user expectations can be met instantly during interactions that are often hugely time-constrained. But these interactions are usually a mere footnote of a user’s journey and there is little incentive for them to complete the task. It is, therefore, the business rules and design decisions that surround the feature that need to be designed properly. Organizations should self-govern how they use a five-star rating feature, and for this, the designer needs to play a key role. Understanding the impact of ratings, when and why they should be used, and how to act on the data they produce is important. It’s too easy to just stick one on your homepage and pat yourself on the back whenever you get a good review.
But let’s take a step back for now. I chose Uber as the opening example not just because it is pretty widely used, but also because I find their recent development of the five-star system particularly interesting. They dig deeper than most companies to uncover more detailed insights into the feedback that passengers give by forcing anyone giving less than five stars to explain just why they have done so. A fair attempt to understand their users and therefore deliver a better service, some may say.
But in making this design decision, Uber is implicitly encouraging more people to give five stars out of convenience (remember earlier when I said that giving a rating often appears at the stage of the user journey with the lowest incentive to do so?). We seem to be heading toward a world where five stars is the norm, and as attractive as this sounds, in reality, I don’t believe it’s because every experience we have truly warrants five stars. A five-star hotel is expected to exceed expectations, not just provide the baseline experience below which we must justify our distaste.
Because Uber is pushing five stars to become the norm, they’ve had to design another way to reward truly standout trips. So, if a passenger gives five stars, they can now also add a “compliment” (e.g., “clean car” or “great route choice”) and a monetary tip. A digital version of the good ol’ days, if you like. But where does that leave the star rating? What’s the point of it then? I can see the value of providing feedback for improvement, especially if the trip was unsatisfactory. But if—like most of my own Uber trips—it was just average, then either you have to conjure up an explanation for why you only gave a meager three or four stars or else submit your stellar five-star review.
A paper published earlier this year by Filippas, Horton, and Golden titled “Reputation Inflation” examined the trend for ratings systems to result in increasing average ratings over time, with the claim that this phenomenon “erodes the comparability of feedback scores over time and reduces the informativeness of a reputation system—potentially making it completely uninformative.” That’s a scary finding when we consider how much we subconsciously rely on these systems.
Reputation Inflation “erodes the comparability of feedback scores over time and reduces the informativeness of a reputation system—potentially making it completely uninformative.”
Back in early 2015, information supposedly leaked from Uber showed that drivers who scored below 4.6 stars were at risk of being deactivated, with 4.6–4.8 described as “below average, need to improve” and 4.8–5 being “above average, keep up the good work!” No surprise, then, with this level of expectation that Uber is pushing passengers to rate five stars or else justify why they shouldn’t. While it may be admirable that the company is finding ways to ensure it gets as close as possible to consistently providing an actual five-star service, I wonder about the wider impact of this manipulation of the star-rating feature. Apologies for the Black Mirror analogy, but will it result in a dystopian future akin to that episode (“Nosedive”) where everything in everyone’s lives is based on an omnipotent artificially intelligent ratings system? Probably not (even though China appears to be trying to), but I believe it does impact our sense of judgment and influence our expectations with further reaching consequences than people realize.
Our reliance on five-star ratings and our tendency to accept them at face value was cleverly demonstrated in a story involving the “Shed at Dulwich.” Essentially, a bloke in South London managed to turn his garden shed into TripAdvisor’s top-rated London restaurant without serving a single dish, simply through exploiting TripAdvisor’s rating system. The best part, and the part most relevant here, is that people bought it and even traveled abroad to experience it without knowing it wasn’t actually a restaurant. It was appointment-only, and the creator, Oobah Butler, routinely rejected requests to spread the impression it was fully booked. I think this is a good example of two things: first, that the rating feature is open to exploitation and doesn’t necessarily tell the whole story, and second, that people often don’t bother to find out the actual story behind the rating.
Another thing I find interesting about feedback is the geographical variation it needs to accommodate. Expectations differ across cultures, and organizations operating in diverse societies should consider this, especially when it comes to interpreting the data from star ratings, as well as other ethnographic influences. It’s all well looking at the data and proudly stating that customer satisfaction is higher in the U.S. than the U.K., but is it really? Or is it just that expectations and outlook are slightly different culturally? Amplified by scale, a small difference can have a big impact. It’s this next level of insight that has the real value rather than the star value itself, yet I’ve worked with some (quite big) companies who often seem to forget this.
The five-star rating is likely here to stay, and it would be interesting to see what happens if we continue in the current fashion and reach a point where everything everywhere is rated the “best”—and therefore, nothing is. Perhaps then we’ll see the rise of the six-star rating (JustEat already does this — why?!). Beyond that, anything is possible...
But really it would be more sensible to ensure that the business rules and design decisions behind any star-rating feature are thought through to account for the wider potential implications of building this feature, including how this might impact both the organization and society. Because again, amplified by scale, a small difference can have a big impact.