Usability Testing 101

Uri Ar

Published in

Aleph

15 min readJun 24, 2019

A How To Guide on how to maximize user testing for interpreting analytics and evaluating decisions.

Copyright Smart Chicago Collaborative used under the creative commons license

Why is usability testing such a powerful tool?

It Helps Settle Arguments

During the process of product design and planning, it is common and natural for people to rely on and express opinions and preconceptions. Claims such as “people don’t scroll” or “no one clicks red buttons” are voiced as if they are absolute truths. Some people tend to interpret the usage data based on assumptions they already have. For example, when the data shows that most users don’t click button x, some may claim that this happens because the users find the feature useless, while others may suggest that the feature is not prominent enough for people to notice. Usability testing sheds light on people’s interactions with our designs and leaves no room for ego or power struggles.

It Enhances Understanding of a Product’s Fulfillment of People’s Expectations

By observing people’s behavior and reactions during a session, we can learn whether our products, features and flows meet their expectations in terms of affordance and resulting actions, (e.g., “I expect a search field to appear if I click the magnifying glass.”), expectations from the product (e.g., “I expect the product to share this only with my friends, but it made my post public.” and in terms of understanding (e.g., “Does the magnifying glass icon represent search or zooming in?”).

It Shows if Business Decisions Match Real-World Use

Observation can shed light on people’s usage and understanding of a product and provide insights about how it conforms with business goals. With usability testing, we may often discover unique and unintended ways people experience products, and then determine whether these results are due to our initial assumptions being wrong, or whether a design flaw is causing the disparity between expectation and actual usage.

It Identifies Design Flaws and Assesses Task Execution and Completion

Whether we’re testing in order to understand performance issues of an existing product or to evaluate a new feature we have designed, usability testing can help us identify flaws that can prevent people from completing a task, or just frustrate and annoy them. Alternatively, the tests can also show whether the experiences generate improvements over previous experiences in terms of enjoyment and satisfaction.

2. Where and when does usability testing fit in experience design?

Early Stage: Exploring and Testing Concepts

Formative usability testing at an early state during the design process can serve as a quick way to assess and validate ideas and gain insights into people’s perceptions, understanding and opinions. The good news is that this type of testing can, in most cases (with the exception of very specific target audiences, contexts or scenarios), provide significant value without having to resort to processes that are budget or labor intensive. Low fidelity prototypes, even hand-drawn paper prototypes, and informal tests with passerby traffic or people from your team who are not involved in the project can prove very helpful for gathering useful insights.

💡 Tip: Create a low-fidelity paper prototype. Test it on people in a local cafe. Recruit them by offering a free coffee and iterate the prototype between tests to optimize it. This allows you to gain many useful insights with a few hours of work and for the price of a few cups of coffee.

Keep in mind that passerby-testing does not work well when your product requires a specialized skill set (e.g., a product targeting geologists) or is targeting a very specific audience (e.g., toddlers).

Competitive Product/Service Research

Performing usability testing on the competition, integrated with publicly available stats and social sentiment research, can shed light on the field we’re exploring. This allows us to learn from what works and what doesn’t and helps us gain insights into people’s impressions and understanding.

It is relatively easy to set up. The products and services are already there. There may be a challenge in cases where a demo request requires vetting by the company, but that is rare. We are left only with the task of creating the test and bringing in the participants.

💡 Tip: Collect social sentiment from publicly available sources such as app reviews, company Facebook pages’ followers’ posts, Reddit groups and even Glassdoor.

💡 Tip: Gather stats from free, publicly available sources such as Alexa, App Annie or Similarweb, as well as from other website and app competitive research and analysis tools. A lot of useful information is free and you can upgrade to premium if and when needed.

Existing Product Optimization

We may be experiencing low performance with a product or a specific funnel and forming ideas as to the reasons why. If there aren’t any apparent reasons, we may want to validate our interpretation of quantitative data we collect with tracking usage and traffic patterns by designing usability tests around the areas and funnels not performing well. We may follow the tests by prototyping the proposed solutions and testing them with users before taking them live. A similar approach could be used for testing new features and flows.

🧪 Example: Seventy-six percent of your buyers drop off on the checkout page of a custom jewelry web shop, while research shows that the industry baseline is twenty percent. We may have some assumptions but we don’t know the reasons.Usability tests focused on the checkout flow can point to underlying reasons and prove or disprove our theses. Following the tests with prototyping and testing possible solutions will help us decide what solutions to implement. We can then validate in production and optimize as needed.

💡 Tip: If data is available, perform cross analytics with reviews, customer feedback and/or support requests to help design better theses and tests.

💡 Tip: When designing new features with no pre-existing stats, the actual performance numbers may expose issues that did not arise in the “lab” settings of usability testing, so optimization or reiteration of design and testing may be required.

During the Design Process, Test Ideas and Design

During the design of a new experience we become so immersed in the project that we may take certain aspects of it for granted and have difficulty assessing how other people will experience them. In other cases, we may be so different from the product’s audience, professionally or otherwise, that we may have ideas we want to test and validate. Usability testing can provide insights into the target audience’s perceptions, understanding and opinions. Building prototypes and testing them with potential users can help us validate our ideas and adjust and optimize concepts before we go into full design or development.

💡 Tip: Test early to assess high level concepts, but also test later on for details, such as copy and comprehension of specific features and interactions.

💡 Tip: When testing generic parts of your project (i.e., features or interfaces that are not targeting a specific audience or skill set, such as a registration funnel), save effort by running many informal tests with passersby audiences.

3. How Many Testers?

The Right Number?

When performing usability testing, we set out to find design failures and errors. Failures prevent the participants from completing a task, and errors trouble and frustrate them.

When searching the web, two quotes from user experience authorities keep popping up:

“Elaborate usability tests are a waste of resources. The best results come from testing no more than five users and running as many small tests you can afford.”

– Jacob Nielsen (UIE)

“It is widely assumed that five participants suffice for usability testing. In this study, 60 users were tested and random sets of five or more were sampled from the whole, to demonstrate the risks of using only five participants and the benefits of using more. Some of the randomly selected sets of five participants found 99% of the problems; other sets found only 55%. With 10 users, the lowest percentage of problems revealed by any one set was increased to 80%, and with 20 users, to 95%.”

– Laura Faulkner (Head of UX research at Backspace)

While these observations appear very divergent, Nielsen is actually referring to a series of tests with iteration (RITE: Rapid Iterative Testing and Evaluation) and Faulkner seems to be talking about a specific number for a one-time session. Steve Krug, author of Don’t Make Me Think, has talked about the value of testing with as few as three participants.

However, this study suggests that testing complex journeys requires a significantly higher number of tests.

Here’s how we suggest approaching it:

Consider What You’re Testing
Relatively safe with a small number (around five) of participants:

Testing an existing product/flow and validating statistics
Generic experiences
Features tested with quick iterations (small number for each iteration)
Low-Fi POC tests

Probably best with a larger number:

Complex flows
Innovative and unique solutions
Designs that are close to the intended final product

💡 Tip: A little testing is better than no testing at all. Always take into account budget, time and the price of getting it wrong.

💡 Tip: Always recruit a few more people than you really need, as no-shows are very common.

4. Who and How: Recruiting Testers

Recruiting testers is a challenge even without taking time and budget limitations into account.

Whom Should You Recruit?

In cases where you have access, base your user profiles on your existing or potential users. When addressing existing users, be mindful of the different characteristics of different user groups, whether in terms of the level of engagement and relationship they have with your product, their personas or types of users they are (i.e., their role in your ecosystem).

When recruiting new users (i.e., potential users), refer to research and product goals.

In cases where access to the audience is complex or difficult, try to recruit surrogate participants: people who are close to your audience in terms of demographics, behavior, roles, expertise, computer literacy, etc. It is important to be aware of assumptions and to be careful about what compromises you make in terms of audience selection. Sometimes, audiences that seem similar on the surface can be very different in practice, (e.g., children of different but close age groups can have very different motor skills or literacy levels) thus causing misleading results. Also be mindful of your selection parameters: demographics may be misleading as an indicator of behavioral similarities.

How to Recruit?

The approach to recruiting changes based on need. When recruiting passerby traffic for spontaneous low fidelity tests, you can try and recruit colleagues who are not involved in the project you’re testing, or offer a gift card in return for testing at a local mall.

When testing an audience from an existing product or service, you may reach out to specific users via messaging within your product (directly, if you have it).

You can also use somewhat more intrusive techniques, such as banners and pop-ups in your product, when recruiting from your user pool or, in other destinations, when recruiting potential users.

In some cases, when looking for specific characteristics, you can use your network, friends and family to recruit testers. In other cases, you can use groups in social networks. For example, when we were testing a kiosk prototype for an American retailer in Tel Aviv, we used posts in an expatriate Facebook group to recruit relevant users.

For remote testing, the different online usertesting and interviewing services, such as usertesting.com, pingpong, etc., also provide tools for recruiting users.

Using a Screener

When recruiting users online, screening questionnaires can help you ensure that only people who meet your criteria for testing will make the cut. Design the questions so they’re neutral; meaning that the people completing them will not know upfront what the “right” answers are.

💡 Tip: You can build a screener form for free using Airtable and filter the collected results according to your needs, presenting only potential matches.

Rewarding the Participants

When recruiting participants, it is good practice to draw them in by offering some form of reward or incentive, be it money or a gift. The reward should be proportionate to the effort they put in. This incentivizes participants to join and show up, and also shows them that you do not take their time, attention and effort for granted.

💡 Tip: If you’re testing a paid service, consider offering membership or premium features.

5. Writing the Script

Since testing is usually a time and budget sensitive task and we often test prototypes that do not cover every aspect of our experience, it is imperative to decide what we are going to test and how we are going to test it. Writing a script helps us plan.

Describe the tasks, flows and impressions you want to test.
Define the setup, the scenario and context that will be communicated to the participants (e.g., “Imagine you are at work, holding a bunch of receipts you collected over the course of the last month and you want to report your expenses.”).
Write the intro you’re going to use, introducing yourself, the concept of the experience you’re going to test and the ground rules (e.g., “Please excuse us for not answering questions during the test, as we want to test the experience in a way that simulates you doing this on your own. We’ll be more than happy to answer any questions you have at the end of the session.”).
Remember permission requests for recording.
Write the setup and instructions, describing the tasks, but not guiding people through them. Tasks are actions or activities that you want the person to perform.
Finish by adding follow-up questions you do not want to miss, in case the answers are not addressed during the test. This is useful for gathering overall impressions and suggestions ( e.g. “What did you think of the design?” or “How can we improve the experience?”).

💡 Tip: You do not have to start from scratch. There are plenty of examples of scripts online that can serve as a starting point.

6. Prototyping

While in some cases we test existing experiences in order to learn from, validate or improve them, more often than not, usability testing is used to test new concepts and designs and assess the validity of ideas and assumptions before going through the effort of designing and developing them in full. Prototyping allows us to do this. The breadth, depth and scope of a prototype, as well as its fidelity, should be based on what we are testing, the stage of the project, the budget and time we have and the cost of getting things wrong.

Horizontal vs. Vertical

Horizontal prototypes usually provide a broad view of a system or subsystem and are useful, in the context of usability testing, for gathering impressions and confirming user interface requirements, system scope and content and structure relevance.

Vertical prototypes focus on a single workflow/user path or a few specific tasks and usually entail enhanced detailed elaborations of these.

Hi-Fi vs. Low-Fi

Depending on where you are in the life cycle of a project, and what time and resources are available, you can build prototypes with different levels of finesse and detail, in terms of both the level of design refinement and functionality.

Low-fi prototypes are usually throwaway artifacts that can even be drawn by hand on paper. Functionality can be emulated by swapping out paper pieces based on the user’s clicks. Low-fi prototype are very useful for gathering insights rapidly and testing with quick iterations between tests.

Hi-fi prototypes are usually used in usability testing to simulate a final experience or flow, helping to sort out final touches of a product’s design and wording.
It is important to note that visual design plays an important role in a product’s usability, both in concept communication and by how it affects people’s perceptions and understanding of the product they are using.

7. Running the Test

It is important to understand the setting of a usability test and its effect on testers. Unmoderated tests, remotely observed, in a “natural” setting usually yield the best, unadulterated results. But they are difficult to manage and arrange and require a prototype or even a finished product to ensure that users don’t reach a dead end or that functionality isn’t missing. It is important to decide if the test is going to be moderated and what type of moderation is required. Moderation always affects the tests, so it is important to decide what type of moderation to apply and to try to minimize its effect as much as possible.

Moderation Types:
Observation with No Interference: While this most closely simulates natural settings, it relies heavily on the observer’s interpretation of the participant’s actions.

Concurrent Think Aloud: You ask the participants to describe what they are about to do before the test, and to relate their impressions during the test. This helps gather real-time feedback and emotional responses, but interferes with usability metrics such as accuracy and task completion time, as it is not a very natural setting.

Retrospective Think Aloud: The observer waits until the end of the test to ask the participants to recall the experience. The problem is that there’s usually difficulty in remembering longer tests, which results in poor data.

Concurrent Probing: The observer only asks the participants for their input when they do or say something interesting. This method is less intrusive, and thus has less of an effect on the test results. However, some of the user’s impressions and actions may be missed using this method.

Retrospective Probing: The observer asks a series of questions after the test. Again, the risk of poor data due to memory issues is balanced with less intrusion during the test.

The Session

The observer introduces themselves and summarizes the procedure (e.g., “We’re going to test a food delivery application.”) and then explains the purpose and rules of the test. It is important to communicate to the participants that they are not the ones being tested; rather they are advancing the development of an experience by weeding out its flaws. Present consent forms and NDAs if needed and explain the test setup, imagined context and scenario. Next, share practical information regarding the test, like how long it’ll take and what to do if the participant gets stuck. Give the participants an opportunity to ask questions, ask them if they’re ready and then get started. Ask for impressions or assign tasks and observe the participants.

Note-Taking & Documenting

Note-taking can be done by a visible or a hidden observer. In both cases, participants should be informed. Note-taking should be non-intrusive and not pose any strain on user actions.

It is good practice to record the screen, camera and microphone at the same time, allowing for visibility regarding the participants’ interactions, comments and corresponding expressions.

Since sessions can be numerous and long, and listening or watching a recording can be taxing, it is best if notes are synced with recordings using a timecode so observers can jump to interesting events in the recording.

Post-Session

Casual chatting with participants after they complete the tasks helps put them at ease and allows you to collect their impressions and opinions.

Start by asking if you forgot to ask anything or if they would like to add anything to their feedback. Continue with non-guided, open-ended questions such as, “What did you think about the experience?.” Since people are often uncomfortable with open criticism, continue to solicit more judgemental feedback by presenting it as a request for help: e.g., “Do you have any ideas how can we improve the experience?.”

8. Analyzing and Reporting Results

Analyzing: Collect and sort the data.
Prioritize by severity (failure, error) and frequency.
Sort by how critical the issue is to the overall experience.
Determine costs and resources needed to solve (ROI).

Sort the issues that come up by impact, severity and frequency of occurrence.

Since the results of usability testing challenge people’s preconceptions, these results need to be neatly “packaged” in order to get buy-in from the rest of the team:

Background: what we were testing for.
Setup: how the test was facilitated..
Who were the participants: the makeup of the testing group.
Overview of findings: an executive summary of the critical findings before diving into the details.
Detailing the tests: tasks, observations, steps, issue severity.
Conclusions and recommendations: sorted by importance and ROI.
Additional points that emerged from the test that we were not testing for.

If you’re submitting a written document, it is best to help readers visualize the test by including photos of the setup and screenshots of the sections tested. Adding user quotes lends it authenticity and breathes some life into the report.

If you’re presenting, showing test videos grabs viewers’ attention and helps connect them on an emotional level.

💡 Tip: If there is a common failure point, editing snippets from a few tests together can really help drive home a point.

Read about the usability testing meetup we held for our portfolio designers here.

Some Useful Links:

Note-Taking

During recording: https://www.notedapp.io
https://www.neukadye.com/mobile-applications/timestamped-field-notes/
Post recording: http://videonotetaker.sourceforge.net

Screen mirroring: for recording phone tablet or computer:

https://www.airsquirrels.com/reflector

Recording:
https://silverbackapp.com

https://www.techsmith.com/morae.html

Testing Sites:

https://hellopingpong.com

https://www.usertesting.com

Recruit users from your product:

https://www.hotjar.com/tour

https://ethn.io

https://help.mixpanel.com/hc/en-us/articles/115004724586-Send-Web-In-App-Messages

https://help.mixpanel.com/hc/en-us/articles/115004708963-Send-Mobile-In-App-Messages-for-iOS-