Usability Testing. Methods and Metrics.

14 min readAug 25, 2021

Hi folks. I have researched several times on how modern usability tests look like. What are the metrics, what methods for which case is more usable? And you know what? There is no complex article about How to Test the Usability of the products. So I decided to come up with one. I combined somthing from here and there and created full guide for selecting and applying Usability Test on the product. Hope you will like it and it will become your handbook on usability testing.

What is it?

Usability testing is hunting for usability problems and trying to identify the design flows that need to be improved.

Why is it?

Usability testing is an essential part of developing an app. With the help of this technique, you can understand what users like and dislike while they’re interacting with the app. This helps to improve the app using an iterative approach and achieve business goals more efficiently.

How can i do it?

Good news, folks. It’s easy and productive with a small number of users.

Only 5 users can help to fix up to 85% of issues using some methods of usability testing. So you need a simple prototype and at least a couple of potential users to get started. And we still have some usability improvement methods where we don’t need users at all.

Depending on the reasons laying behind your usability testing, you should first choose an appropriate type of testing, that covers your needs. And then select a method and provide the test. And don’t forget to Analyse, Improve, Iterate.

Usability testing types

Before you pick a usability testing method, you must make several decisions about the type of testing you need based on your resources, target audience, and research objectives (the questions you want to get an answer to).

The three overall usability testing types include:

Moderated vs. Unmoderated
Remote vs. in Person
Explorative vs. Assessment vs. Comparative

1. Moderated vs. Unmoderated

Moderated testing session is administered in person or remotely by a trained researcher who introduces the test to participants, answers their queries, and asks follow-up questions.
Unmoderated test is done without direct supervision; participants might be in a lab, but it’s more likely they are in their own homes and/or using their own devices to browse the interface that is being tested.

Moderated testing usually produces in-depth results thanks to the direct interaction between researchers and test participants, but can be expensive to organise and run (e.g., securing a lab, hiring a trained researcher, and/or providing compensation for the participants).

The cost of unmoderated testing is lower, though participant answers can remain superficial and follow-up questions are impossible.

As a general rule of thumb, use moderated testing to investigate the reasoning behind user behavior, and unmoderated testing to test a very specific question or observe and measure behavior patterns.

2. Remote vs. in-person

Remote usability tests are done over the internet or by phone.
In-person testing, as the name suggests, requires the test to be completed in the physical presence of a UX researcher/moderator.

Remote testing doesn’t go as deep into a participant’s reasoning, but it allows you to test large numbers of people in different geographical areas using fewer resources.

In-person tests provide extra data points, since researchers can observe and analyse body language and facial expressions. However, in-person testing is usually expensive and time-consuming: you have to find a suitable space, block out a specific date, and recruit (and pay) participants.

3. Explorative vs. assessment vs. comparative testing

These three testing types generate different types of information:

Explorative tests are open-ended. Participants are asked to brainstorm, give opinions, and express emotional impressions about ideas and concepts. The information is typically collected in the early stages of product development and helps researchers pinpoint gaps in the market, identify potential new features, and workshop new ideas.
Assessment research is used to test a user’s satisfaction with a product and how well they are able to use it. It’s used to evaluate the product’s general functionality.
Comparative research methods involve asking users to choose which of two solutions they prefer, and they are used to compare a website with its primary competitors.

Usability testing can be either qualitative or quantitative.

Qualitative usability testing focuses on collecting insights, findings, and anecdotes about how people use the product or service. Qualitative usability testing is best for discovering problems in the user experience. This form of usability testing is more common than quantitative usability testing.

Quantitative usability testing focuses on collecting metrics that describe the user experience. Two of the metrics most commonly collected in quantitative usability testing are task success and time on task. Quantitative usability testing is best for collecting benchmarks for Assessment and Comparency.

The number of participants needed for a usability test varies depending on the type of study.

For a typical qualitative usability study of a single user group, recommended to use five participants to uncover the majority of the most common problems in the product.

Quantitative usability study need a solid amount of users. Actually if not to dive deep in to the statistics science, you need 300 users or more, to get statistically correct assessment for most usability testing cases.

Metrics or how to measure usability.

Typically, usability is measured relative to users’ performance on a given set of test tasks. The most basic measures are based on the definition of usability as a quality metric:

Success completion rate (whether users can perform the task at all),
Time a task requires.
Errors rate.
Users’ subjective satisfaction. (How pleasant is it to use the design?)

Methods

Note, that every test report should include detailed description of test conditions and the reason behind testing (Why we test, What we test, How we test). WWH ! (=

# Heuristic Evaluation

Description:
This method contains Heuristic analysis of existing interface, in order to compare it with the set of usability rules, also known as Heuristics.

When to use:

Quick and chip, should be implemented as a first step of usability testing for almost every single case.

Types:

Assessment

Artifacts:

Documented results with a list of errors according to Heuristic Evaluation for every single flow.
Action plan for the flow improvements.

#Lab usability testing

Description:

This type of usability research takes place inside a specially built usability testing lab. Test subjects complete tasks on computers/mobile devices while a trained moderator observes and asks questions. Typically, stakeholders also watch the proceedings and take notes behind a one-way mirror in the testing area.
A major benefit of lab usability testing is the control it provides: all sessions are run under the same standardised conditions, which makes it especially useful for comparison tests. However, these tests are expensive and usually based on a small population size (8–10 participants per research round) in a controlled environment, which is not necessarily reflective of your actual customer base and/or real-life use conditions.

When to use:

To get a reliable data about compare few things. (Compare product versions, compare with competitor, usability assessment according set of rules and metrics)

Types:

Moderated, in Person, Assessment/Comparative.
Artifacts:
Video/audio records
Comparison table/assessment table
List of Insights.
List of Concerns.
Documented Report with test conditions (Test scenarios, target users, users amount), metrics, artifacts, insights and conclusions.

#Guerrilla testing

Description:

In guerrilla testing, test subjects are chosen at random from a public place, usually a coffee shop, mall, or airport. They are asked to perform a quick usability test, often in exchange for a gift card or other incentive.
Guerrilla testing is used to test a wide cross-section of people who may have no history with a product. It’s a quick way to collect large amounts of qualitative data that validate certain design elements or functionality — but it’s not a good method for extensive testing or follow-ups, as people are usually reluctant or unable to give up more than 5–10 minutes of their time.

When to use:

Quick way to collect large amounts of qualitative data that validate certain design elements or functionality.

Types:

Moderated/Unmoderated, in Person, Explorative

Artifacts:

Video/audio records
Comparison table/assessment table
List of Insights.
List of Concerns.
Documented Report with test conditions (Test scenarios, target users, users amount), metrics, conclusion.

# Phone interviews

Description: In a phone usability test, a moderator verbally instructs participants to complete tasks on their computer and collects feedback while the user’s electronic behavior is recorded remotely.

Phone interviews are an economical way to test users in a wide geographical area. Because they are less expensive than in-person interviews, they help collect more data in a shorter period.

When to use:

Economical way to test users in a wide geographical area. Collect data in short period.

Types:

Moderated, in Person, Explorative

Artifacts:

Call records.
List of asked questions with answers for every tested user.
List of Concerns.
Documented insights and ideas for improvement.

# Card sorting

Description: Card sorting involves placing concepts on virtual note cards and allowing participants to manipulate the cards into groups and categories. After they sort the cards, they explain their logic in a moderator-run debriefing session.

Card sorting is a great method for both new and existing websites to get feedback about layout and navigational structure. Its results show designers and product managers how people and potential customers naturally organise information, which can help make a site more intuitive to navigate.

When to use:

To get feedback about navigation and structure.

Types:

Unmoderated, in Person/Remote, Explorative/Comparative.

Artifacts:

Table of sorted instances with quantitative assessment for categories.

# Tree testing

Description: Tree testing is a usability technique for evaluating the findability of topics in a website/application. It is also known as reverse card sorting or card-based classification. A large website is typically organised into a hierarchy (a “tree”) of topics and subtopics. Tree testing provides a way to measure how well users can find items in this hierarchy. Unlike traditional usability testing, tree testing is not done on the website itself; instead, a simplified text version of the site structure is used. This ensures that the structure is evaluated in isolation, nullifying the effects of navigational aids, visual design, and other factors.

In a typical tree test:

The participant is given a “find it” task (e.g., “Look for men’s belts under $25”).
They are shown a text list of the top-level topics of the website.
They choose a heading, and are then shown a list of subtopics.
They continue choosing (moving down through the tree, backtracking if necessary) until they find a topic that satisfies the task (or until they give up).
They do several tasks in this manner, starting each task back at the top of the tree.
Once several participants have completed the test, the results are analysed.

Tree testing was originally done on paper, but can now also be conducted using specialised software.

When to use:

To get feedback about navigation and structure of an interface.

Types:

Unmoderated, in Person/Remote, Explorative/Comparative.

Artifacts:

Session records.
Comparative table for tested instances with errors rate.

# Session recordings

Description: Session recordings use software to record the actions that real (but anonymized) people take on a website such as mouse clicks, movement, and scrolling. Session recordings are a fantastic way to spot major problems with a site’s intended functionality, watch how people interact with its page elements such as menus and Calls-to-Action (CTAs), and see places where they stumble, u-turn (go back to a previous page quickly after landing on a new one), or completely leave.

When to use:

Watch how people interact with page elements and see places where they stumble, u-turn , or completely leave.

Types:

Unmoderated, Remote, Explorative/Comparative.

Artifacts:

Session recordings.
List of insights.
List of Concerns.
Errors rate for tasted instances.
Success completion rate.

# 5-second test

Description: In this test, website owners upload a screenshot of their webpage with a single question like “What is the main element of the page that stuck with you?” or “Who do you think the intended audience is?” Test subjects have five seconds to look at the page before they answer the question.

This is an easy way to collect a large amount of qualitative data about people’s first impressions and reactions to your product.

When to use:

To collect a large amount of qualitative data about people’s first impressions and reactions.

Types:

In person, Explorative, Qualitative.

Artifacts:

Session records.
List of Insights.
List of Concerns.

# First-click

Description: The goal of first-click testing is to evaluate whether users can easily identify where they need to navigate to complete a given task. The participant is asked a question like “Where would you click to buy this product?” and the software records where they direct their mouse.

First-click testing is useful for collecting data on user expectations and determining the prime location for menus and buttons. By measuring how long it takes users to make a decision, you learn how intuitive your interface design and linking structure are.

When to use:

Collecting data on user expectations and determining the prime location for menus and buttons.
Types:
In person, Explorative, Qualitative.

Artifacts:

Report with time to complete the task.
Errors rate.
Success rate.
List of insights.

# Observation

Description: In this sort of test, the researchers watch but don’t participate, acting as a sort of ‘fly on the wall’ as participants run through a set of instructions in a lab. They may interject if a participant gets stuck, but otherwise, they remain quiet and concentrate on taking notes.

Observation testing allows researchers to see the body language and facial expressions of participants without interference from a moderator.

When to use:

Get insights quick on user problems and explore the emotional part of the design.

Types:

In person, Explorative.

Artifacts:

Session recordings.
List of insights.
List of concerns.
Errors rate for tasted instances.
Success completion rate.
User subjective satisfaction rate.

# Eye-tracking

Description: During eye-tracking tests, researchers observe and study users’ eye movements using a special pupil-tracking device mounted on a computer. By analyzing where users direct their attention when asked to complete a task, the machine can create heatmaps or movement pathway diagrams.

Eye-tracking studies can be used to glean information about how users interact visually with a page; they also help test layout and design elements and see what may be distracting or taking someone’s focus away from the main interface elements. The downside? Cost: an eye-tracking study requires you to rent a lab with special equipment and dedicated software (plus the trained technician who can help you calibrate the device).

When to use:

Obtain information about how users interact visually with an interface. Help to test layout and see what may be distracting or taking someone’s focus away from the main elements.

Types:

In person, Explorative, Qualitative.
Artifacts:
Hit map of the product
Session recordings
Time to complete the tasks

Methods that are not usability testing

Usability testing is all about having individuals test and experience a product’s functionality. The techniques listed below are occasionally labeled as usability testing — and although they technically are not, they can (and should) be used in conjunction with usability testing to generate more comprehensive results:

A/B testing: unlike usability testing, which investigates user behavior, A/B testing is about experimenting with multiple versions of a webpage to see which is most effective. It’s an important tool for increasing conversions.

Acceptance testing: this is often the last phase of the software testing process, where users follow a specific set of steps to ensure the software works correctly. This is a technical test of quality assurance, not a way to evaluate if the product is user-friendly and efficient; still, acceptance testing is an important step in creating a well-vetted product.

Focus groups: when conducting a focus group, researchers gather a small number of people together to discuss a specific topic. It’s a great method for discovering participants’ opinions about a product or service (but it can also introduce bias when some participants are more vocal or persuasive than others).

Surveys: a gauge of user experience, surveys can be used in conjunction with usability testing as a follow-up or a method of gathering user feedback.

Heatmaps: heatmaps and scroll maps produce a visual representation of how users move around a page by showing its hottest (most popular) and coolest (least popular) parts. They are technically not usability testing because they report on user actions in aggregate, but they are a good way to observe and objectively measure behavior on your website.

Contextual inquiry: Contextual inquiry is less a usability testing method and more like an interview/observation method that helps a product team obtain information about the user experience from the real users. Test participants (real users) are first asked a set of questions about their experience with a product and then observed and questioned while they work in their own environments.

Interview: A user interview is not Usability Testing but a UX research method. During Interview a researcher asks one user questions about a topic of interest (e.g., use of a system, behaviors, and habits) with the goal of learning about that topic. UX Interviews tend to be a quick and easy way to collect user data on the early design stages and a cost-effective method on middle and late design stages to get insights on UX issues.

Field studies: Field research is conducted in the user’s context and location. Learn the unexpected by leaving the office and observing people in their natural environment. This method is not Usability Research but User Research method. Field studies can be done at any time, but it often makes sense to do them before design (or redesign) begins, because such research can lead to fundamental shifts in understanding your users and can change what you would design for them. Watching people do particular activities can illuminate what people really do versus what they say.

So…..How to pick one for me?

Guide for newbies.

If you are on the Discovery and Architecture research stage:

The main goal is to define the correct product structure and skip simple heuristic usability mistakes.

Card sorting
Field studies
Tree testing
Guerrilla testing
Heuristic evaluation.

If you are on the Early design phases:

Here you will test prototypes so you need to have one that you can share in test purposes.

Guerrilla testing
Heuristic evaluation
5-second test
Phone interview
Lab usability testing
Observation
Session recordings
First click

If your product is already running:

The main goal is to improve so you should carefully measure the results and then do another iteration of testing after implementation.

You can use any method.

No jokes — all are good depending on test purposes.

One thing you should be aware — Implementation costs should be less than incomes from this implementation. Otherwise no matter how wrong the design is — it not worth do changes.