User Research and Testing: How many users?

Agustina Feijóo
6 min readApr 24, 2018

--

As part of my work as a UX Designer at an agency, I built a User Research “best practices” document to guide the agency’s research efforts. In that document, I addressed several topics like procedures, processes, tools, time, costs and more.

While building the document, some questions came up. Some were easier to answer than others. One question that I found a bit challenging was “How many users?”. It was challenging because for some User Research techniques I would find very diverse opinions on the number of users needed, while for other techniques I couldn’t find a straightforward answer to the question. That is why I decided at the time that once I was finished with the document, I would share my findings in a summarized way.

To produce the document, I consulted two sources mainly:

Both sources are reliable and the authors featured have a vast experience in the field. Of course, there is plenty more material and authors available, and points of view on the matter. This post does not intend to carve anything in stone, it just intends to serve as a quick guide or starting point for someone who wishes to learn more about User Research.

User Research Techniques

The User Research techniques that I will be addressing to answer the question are:

  • Card sorting
  • Tree testing
  • First-click test
  • Usability testing
  • Lostness metric
  • Interviewing
  • Diary study
  • Experience Sampling

So, how many users?

Card sorting

Number of users to test: 15

Source: NN/g

Quote:

“You must test 15 users to reach a correlation of 0.90, which is a more comfortable place to stop. After 15 users, diminishing returns set in and correlations increase very little: testing 30 people gives a correlation of 0.95 — certainly better, but usually not worth twice the money. There are hardly any improvements from going beyond 30 users: you have to test 60 people to reach 0.98, and doing so is definitely wasteful.

Tullis and Wood recommend testing 20–30 users for card sorting. Based on their data, my recommendation is to test 15 users”.

Tree testing/First-Click Testing

Number of users to test: 20

Source: NN/g

Quote:

“In the chart, the margin of error is expressed as a percent of the mean value of your usability metric. For example, if you test 10 users, the margin of error is +/- 27% of the mean. This means that if the mean task time is 300 seconds (five minutes), then your margin of error is +/- 81 seconds. Your confidence interval thus goes from 219 seconds to 381 seconds: 90% of the time you’re inside this interval; 5% of the time you’re below 219, and 5% of the time you’re above 381.

This is a rather wide confidence interval, which is why I usually recommend testing with 20 users when collecting quantitative usability metrics. With 20 users, you’ll probably have one outlier (since 6% of users are outliers), so you’ll include data from 19 users in your average. This makes your confidence interval go from 243 to 357 seconds, since the margin of error is +/- 19% for testing 19 users.

You might say that this is still a wide confidence interval, but the truth is that it’s extremely expensive to tighten it up further. To get a margin of error of +/- 10%, you need data from 71 users, so you’d have to test 76 to account for the five likely outliers.

Testing 76 users is a complete waste of money for almost all practical development projects. You can get good-enough data on four different designs by testing each of them with 20 users, rather than blow your budget on only slightly better metrics for a single design.”

Usability Testing/Lostness Metric

Number of users to test: 5

Source: NN/g and Validating Product Ideas

Quote:

“For a qualitative online usability testing study, five participants will serve you well.”

“Lostness matrix: Use this sheet when you observe a usability test or when you look at the path that each study participant took to complete a task. […] The spreadsheet is set for 5 participants who complete 5 tasks.”

Validating Product Ideas

“As you add more and more users, you learn less and less because you will keep seeing the same things again and again. There is no real need to keep observing the same thing multiple times, and you will be very motivated to go back to the drawing board and redesign the site to eliminate the usability problems.

After the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.”

NN/g

Interviewing

Number of users to test: 10

Source: Validating Product Ideas

Quote:

“Interviews generate huge amounts of rich data, somewhat similar to the amounts you might collect in observation (see Chapter 3) or diary studies (see Chapter 4). These large amounts of collected data directly affect your choice for the number of interviewees you include in the study. As in other qualitative methods, keep this number low and digestible.

Ten interviewees is a good number. More than that means this is a large study. You will need more time or hands when it comes to analyzing data and coming up with results. Ten interviewees are good also in terms of study length. If your team splits into two interviewing pairs, each pair can complete five interviews in one day and finish data collection in one day. Alternatively, if you can only conduct interviews in the evening, schedule two interviews per pair per evening for a data collection time of three days”.

Diary Study

Number of users to test: 8

Source: Validating Product Ideas

Quote:

“Diary studies generate huge amounts of rich data, somewhat similar to the amounts you might collect in observation (see Chapter 3) or interviewing (see Chapter 2). These large amounts of collected data directly affect your choice for the number of participants you include in the study. As in other qualitative methods, keep this number low and digestible.

Eight participants is a good number, yet any number between 6 and 12 participants makes sense. More than that means this is a large study. If that is the case, you will need more time or hands when it comes to analyzing data and coming up with results”.

Experience Sampling

Number of users to test: 20

Source: NN/g and Validating Product Ideas

Quote:

“Experience sampling generates huge amounts of data that affect your choice for the number of participants you can include in the study. The number of participants should be a trade-off between having enough participants who contribute enough answers to the question you ask over and over again and having a number that is too much to handle. For example, 5 participants is a very small number that will not get you enough data and verity. If these participants give you 5 answers each day for 5 days, you’ll have 125 answers. That’s not enough. On the other hand, 1,000 participants are probably too many for you to handle. Imagine if each one of them contributes 5 answers each day for 5 days. That’s 25,000 answers that need to be read, classified, and analyzed. Can you handle that?

Depending on how many answers you want, make sure that the number of participants is relatively low and digestible. Almost any number between 25 and 200 participants is something that probably makes sense. 500–1,000 answers is a range of answers you can work with, be confident it’s comprehensive, and handle alone or with a team of people who support the analysis”.

Validating Product Ideas

“Also, remember that the +/- 19% is pretty much a worst-case scenario; you’ll do better 90% of the time. The red curve shows that half of the time you’ll be within +/- 8% of the mean if you test with 20 users and analyze data from 19. In other words, half the time you get great accuracy and the other half you get good accuracy. That’s all you need for non-academic projects”.

NN/g

Example:

20 users — 5 responses/day = 100 responses total/day

10 days of study: 1000 responses

--

--

Agustina Feijóo

UX Designer currently based in Argentina, working remotely for the world. - www.uxagustina.com