Five Stars or Failure: How Ratings Mislead
In the 5/24/2015 Sunday New York Times, Maureen Dowd laments her Uber passenger rating of 4.2, which is apparently low enough to make it difficult for her to get a car sometimes. This makes it clear that Uber’s 1-through-5 star system doesn’t really have 5 possible ratings, it has two: 5, and everything else.
I first came across a system like this when taking my car to the dealership for service. When I picked it up the technician mentioned that I would likely get a phone call asking me to rate the service I had received on a scale of 1–5 stars, and, as he cheerfully pointed out, “Anything less than five stars is considered failure, so I hope I’ve given you five-star service today.”
No Room for Nuance
My wife doesn’t like to give every Uber driver a 5. She considers a 4 to be indicative of Uber’s generally very good service, and likes to reserve a 5 for drivers who are really excellent (because they have water bottles in the back, or get the door for her, or are in other ways outstanding). But because of the way Uber’s system works, it’s really not fair for her to give a very-good-but-not-excellent driver a 4 — that’s failure in Uber’s book. In a system with only one “good” rating, there’s no place for “great.”
Noisy Results
At least with a 5-or-nothing system, it’s pretty clear what everyone is expecting. If Uber declared that 3 is “acceptable,” 4 is “above average,” and 5 is “amazing,” there would still be some users who give a 5 when they are happy, and 4 or less when they are unhappy, and that would mess up the scale. The attempt to collect a more nuanced rating runs the risk of confusing everyone and giving a noisy, less informative result.
Lack of Clarity
Even where a 5-star system works more or less as intended, the scale can be unclear. Netflix’s 5-star system is an example: it’s easy to find films rated near 5 stars, and near 1 star. But that doesn’t mean it’s clear. Does 3 stars mean average quality, or is the scale centered closer to 4 stars? Does the star rating assess the film on an absolute scale, or within its genre? Roger Ebert said that he awarded stars based on how well it fulfilled its role within its genre: The Blair Witch Project clearly isn’t the same sort of film as Apocalypse Now, but he gave them both 4 stars.
Solution: use color or other visual aspects to indicate the middle of the scale
This Happy or Not feedback collector I found at Keflavik airport in Iceland uses a range of colors and happy/sad faces to cue the user to what the scale is. (Yes, I take photos of user interfaces I find interesting)
Solution: break the review into two steps
This works much better on the web or in an app. First, just ask if the user is satisfied, yes or no. Then follow up with a rating of how happy or unhappy they are, or with a request for clarification. This could clarify the unclear scale of Netflix’s star ratings.
Solution: text descriptions
On a five-step scale, something like:
- Very unhappy
- Unhappy
- Neutral
- Happy
- Very happy
This barely works with five steps — is “neutral” the default, or is “happy?” — and more is worse. The often-used ten-step scale where only the very bottom and top are given descriptions is a complete failure — what’s the difference between a 6 and a 7 on a ten-step scale from “Completely Dissatisfied” to “Completely Satisfied?”
Solution: a minus-plus system
This is a simple alternative to the star system, similar to the Happy or Not system pictured above. Present the user with a simple — — + + like so:
This would have to be tested to see how well it works, and if people understand it, but it seems likely to parse even more quickly than the smiley faces used by Happy or Not.
Conclusion: don’t use stars
It’s clear that a simple five star scale is problematic in multiple ways. Part of that is simply because the task itself is challenging: getting uniform responses from a variety of people is tough. But there are fundamental issues with a star system that can likely be improved upon with a better user interface.
PS: I asked an Uber driver what my rating is. It’s 4.9, which means I can generally get a car, but also that somewhere along the way I came up short as a passenger.