Rating scales: the good, the bad, and the ugly!

Natalie Herd
Mind Tap Research
4 min readMar 21, 2018

--

Providing feedback using rating scales has become a part of our everyday lives. In a single week, I was asked to provide feedback using ratings scales on five separate occasions. The first set of ratings I was asked to provide were in an online survey assessing pricing thresholds for an online newspaper subscription. This was an excellent survey with clearly labelled, easy to use rating scales.

The second opportunity for feedback was for Netflix, using their clear, but not particularly informative thumbs up/thumbs down rating.

The third opportunity was in a confidential online survey, with possibly one of the worse ratings scales I’ve ever come across (more on this later).

The fourth was for my son’s homework app, using this somewhat odd rating scale, where points four and five appear to be presented in the wrong order:

And the final rating I was asked to provide during the week was for an Uber driver. Yes, I know Uber is far from being a bastion of ethical behaviour, and that I shouldn’t be using them. In fact, there have been some interesting articles recently (here and here) about their unethical use of “nudge” tactics to influence driver behaviour. But what I’d like to discuss here is their very odd rating scale and the consequences of using a scale without labels.

Without reading Uber’s documentation on their five-star rating scale, it would be fair to assume that most people would interpret the scale as they would a traditional five-point satisfaction rating scale. For example, a satisfaction rating scale conforming to Jon Krosnick’s criteria for a good unipolar rating scale (accessed here), might look like this: Not at all satisfied, Slightly satisfied, Moderately satisfied, Very satisfied, Completely satisfied). As you can see here, a rating of four equates to being “Very satisfied”.

According to Uber, however, drivers are “encouraged” to maintain an average rating of at least 4.6 out of five, and drivers with an average lower than this may be at risk of their profiles being deactivated. Clearly Uber has a very different perspective on what a rating of four out of five means. At university, a grade equivalent to four out of five (i.e., 80%) would see you being awarded the top mark. Uber drivers, on the other hand, with an average below 4.3 after their first 25 trips, are deactivated.

This discrepancy highlights the issues that arise when people don’t include labels on each point of a rating scale. I would like to think that Uber has simply misunderstood that many riders, at least initially, probably interpret their five-point scale in linear terms — rather than five being good or acceptable and one to four representing varying degrees of terrible — but something tells me that this is another of Uber’s deliberate attempts to “nudge” their drivers into going to great lengths to please their customers.

Although longer rating scales provide more detailed information, they also make it more difficult for people to know where they fall on that scale, particularly if the scale is unclear, as is the case here. If Uber is genuinely interested in simply identifying which of their drivers are providing an unacceptable level of customer service, then they would be better off switching to a Netflix-style thumbs up/thumbs down rating, where thumbs down is clearly labelled “Unacceptable” and thumbs up, “Acceptable”. If Uber require feedback with greater granularity, then they would need to alter their five-point rating scale, so that each point is labelled with uniformly spaced meaning, from one end of the continuum to the other. Accordingly, Uber would also need to change the average rating threshold at which drivers are at risk of being deactivated, to align with the updated rating scale.

Earlier, I mentioned an online survey with one of the worse rating scales I have ever come across. Thankfully I took a screenshot of it at the time to discuss here:

These two rating scales came with no instructions or labels, just a set of statements to “rate”. The most obvious problem with these scales is that they have no labels to indicate what the different numbers stand for. Is 0 meant to be indicative of strong disagreement? Or strong agreement? Is the scale unipolar or bipolar? Does the scale overall even represent levels of agreement or is it some other construct the researchers are seeking to measure? Without any additional instructions, any data collected from scales such as these is entirely meaningless and unusable.

So, before sending out a survey to your valuable customer list or online panel, make sure you use rating scales that are labelled, intuitive, and easy for respondents to use. Poor rating scales not only result in unreliable feedback, but also give negative impressions of your brand.

Originally posted on: https://www.mindtapresearch.com/news/2017/10/11/rating-scale

--

--

Natalie Herd
Mind Tap Research

Founder of Mind Tap Research. PhD in behavioural science. Online market and social researcher.