Higher or Lower?

Making sense of usability findings with criteria based severity ratings

Chris Callaghan
5 min readOct 30, 2013

Having completed a number of broad usability labs over the last couple of months, I've had the opportunity to refine many of my processes, one of which is the logging and analysis of lots of usability issues.

Back in the day when I first started out usability testing, it quickly dawned on me that simply spotting a user having difficulty, logging it as an issue, and then later rating it as high, medium or low just wasn't good enough.

From the outset, the rating mechanism was completely subjective, and while the difference between high and low wasn't that debatable, why an issue was classed as medium and not high or low was.

As I received my usability lab training from the Nielsen Norman Group, I turned to Nielsen for his approach to severity ratings for usability problems.

Nielsen’s severity rating scale

“In order to facilitate prioritizing and decision making”, Nielsen suggests the following rating scale to classify usability findings:

  1. Not a usability issue
  2. Cosmetic issue: the fix being a nice to have
  3. Minor usability issue: low priority fix
  4. Major usability issue: high priority fix
  5. Usability catastrophe: imperative to fix before system release

On reflection, this scale didn't offer anything beyond the original high, medium and low scale I was trying to replace. Points 1 and 2 were essentially a don’t bother option, catastrophe became my new high and major became my new medium.

I wanted to confidently report why I had classified an issue as medium and not as high or low, so I dug a little deeper.

Nielsen’s definition of a usability issue

Turning back to Nielsen, I knew that he defined a usability issue as being a combination of three factors.

  1. The frequency with which the problem occurs
  2. The impact of the problem if it occurs
  3. The persistence of the problem

I was already recording frequency and impact in my Heuristic Evaluations, so adding a persistence column and updating my final score formula wasn't going to be any trouble.

The problem was, I was still going to use a low, medium and high rating for each of the three components. I was still facing the same problem, but making it more complicated.

Discovering a criteria based approach

After some further research I discovered an article by @baekdal who shared the same concerns, but proposed an alternative method to usability severity rating developed on the back of software testers’ rating scales.

What I liked about Baekdal’s severity rating scale was that it had a clear criteria based approach which significantly reduced the amount of subjectivity (and time) involved in rating an issue. A third party could essentially review the usability log and understand why an issue was rated in a particular way.

  1. Not an issue
  2. Minor: Cosmetic issues, spelling issues, non-critical workflow issues
  3. Serious: Normal status for an issue
  4. Major: Loss of functionality, problematic impact on a person’s workflow
  5. Critical: System crashes, workflow breakdown, complete loss of focus for a specific task, loss of information
  6. Fatal: Blocker, issue prevents further use

My approach to severity rating

Whilst I can’t take any credit for Baekdal’s approach which I did adopt, I can add a number of enhancements that I've implemented based on years of using it in the field.

Customise the criteria to your system

Firstly, ensure you build on the criteria, personalising it to the domain in which the system in question operates.

For example, when testing a medical software application, it would make sense to add the criterion “incorrect information is presented” to the fatal rating if it has the possibility to impact human life. However, incorrect information for a financial application might be critical, and for hobbyist supplier website might only be major.

In some instances, you may want to further remove subjectivity. For example, if the system crashes and prevents further use, is that critical or fatal?

If you’re testing a website which crashes, yet the user can recover by reloading the site, will you ever be able to reach a fatal classification?

However, for a kiosk software application where the user is unable to perform a software or hardware reboot, fatal is pretty much fatal.

You may even want to tie in moderator assistance to some of the criteria to help with your classification. Major might include “requires some assistance from the moderator” and critical might include “cannot continue without assistance from the moderator.”

The key is to define and agree your criteria before starting the severity rating process and make it work for you.

Aim to arrive at a clear checklist where there’s no debate… “if this happened, then the issue has to be classed as that.”

Be client friendly

If you decide to agree and define criteria without your client, it would be advisable to consider the terminology used for the five shortcuts (minor, serious, major, critical and fatal) and whether they are suitable.

For example, reporting back that you encountered two “fatal” issues could be alarming for some clients or a bit melodramatic for others.

Make everything serious first

One micro-process I have is to first classify everything as serious; this being the normal status for an issue. I then check against the criteria, finding reasons to classify it as something else.

Record frequency

When recording the severity rating, ensure you record the frequency with which issues occur across the lab.

For example, if I’m analysing a lab of say five participants, I will create a spreadsheet for each participant and chronologically line item the individual issues. I time-stamp when they occurred in the session and assign the issue a severity rating based on my domain specific criteria.

When done, I create a master spreadsheet which pulls in all the findings from the individual participant spreadsheets. I remove duplicates, each time adding a +1 to the frequency column.

This way, I arrive at a single log of all usability issues from the day, with visibility of their frequency, and an appropriate severity rating for each.

Master log combining issues from multiple participant sessions

Quick analysis

Finally, from the master spreadsheet, I order the data by the frequency column from highest to lowest. I then secondary sort the data by the severity rating, again, highest to lowest.

Master log ordered by frequency of most severe issues. The top half of the list shows recurring issues, while the bottom half shows edge cases by severity

At this point you will be presented with line items in the order of the most frequently occurring highest rated issues — these are the guys you’ll want to have a look at first. The single instance, but high severity issues may be your edge cases and should be investigated further.

That’s all folks

This concludes my tips and experiences of severity rating in a bid to stop playing “higher or lower” — a guessing game which shouldn't really have a place in usability test analysis.

If there’s one take-away…

Define specific and mutually exclusive domain-related criteria to arrive at an unambiguous and quick to use rating system.

Found this useful?

Please ❤ Recommend for others to see.

Thank you : )

--

--

Chris Callaghan

@CallaghanDesign - UX & Optimisation Director at McCann Manchester. NN/g UX MASTER Certified | HFI Usability Certified | Contentsquare Certified.