The Right Number of User Interviews


When planning out product research, I usually have to estimate the number of generative user interviews to conduct. Teams need to know how many users it will take to confidently make product decisions based on their feedback. Budgets, schedules, and the success of the product itself depend on a reasonable estimate.

I estimate a round of generative user interviews like this:

  • Start with 5
  • Increase by up to 5 more depending on domain complexity
  • Multiply by the number of personas covered
  • Add one to three interviews to work out process kinks if this is new territory or a new team

Justification below, but first some background:

You’re after saturation.

The key goal qualitative researchers shoot for is saturation.

Theoretical saturation occurs when all of the main variations of the phenomenon have been identified and incorporated into the emerging theory. -Guest et al

A qualitative researcher’s job is to learn all of the ideas, behaviors, stories, or experiences from a set of subjects within a domain. The researcher starts with an open-ended question, breaks it down into a series of smaller questions or topics, recruits, and begins interviews. Research hits a saturation point when incremental interviews don’t introduce new information. Since every new insight is important, you want to get reasonably close to a complete understanding of your domain.

So, interview until you don’t learn much from new interviews.

To understand whether you’ve arrived at saturation, compare successive synthesis sessions after interviews. How many new flashes of insight came up? How many new categories did you create to account for user sentiments? When the learnings taper, move on. This should be accompanied by a collective sense among the team that you’re learned enough.

There’s no single correct number

Expert recommendations are all over the map regarding the “right” number of interviews to conduct, across scientific and user research communities.

  • Gaskin, Griffin, Hauser & Co recommend 10–30 interviews that produce 75–150 statements from customers in Voice of the Customer.
  • Daniel Bertaux said “15 is the smallest acceptable sample” in a 1981 paper on sociological research.
  • Greg Guest and co concluded from an ethnographic study that they created 97% of their research codes within 12 interviews and 94% in the first 6.
  • Nielsen Norman says that 5 usability test subjects are all you should target per round.*
  • The 560 qualitative research papers that Mark Mason surveyed in 2010 conducted anywhere from 1–95 interviews, averaging 31 with a large standard deviation.

There’s no clear consensus on the topic because the researchers are conducting different kinds of interviews, in different domains, with different populations.

Studies claiming that very few user interviews are sufficient work within a tight, agreeable set of conditions, like usability testing an update to one mobile app scenario. This is like saying you can feel confident driving 120mph on the freeway when it’s sunny with good visibility, the road is straight, and your Porsche is in good repair. There’s a good reason why the speed limit is a lot lower than 120.

Usability testing a product among one homogenous group of users will start to feel repetitive after just a few interviews, and you should rightfully plan for a small number of them (5 or so). On the other hand, performing ethnographic research on a large domain and a diverse set of individuals requires a larger sample.

How to (Roughly) Estimate Your Sample Size

To roughly estimate the number of interviews you’ll need to conduct, consider a few factors:

1. Diminishing Returns

You’ll learn less from each successive interview, in a diminishing-returns shape that looks something like this:

When interviews start to feel repetitive, you’re not likely to gain much by doing more. Pull the rip cord and don’t conduct unnecessary interviews.

2. The size of your domain

The size of the topic you’re researching changes the amount of information you’ll need to gain before you’ve hit saturation. Usability testing for a single scenario or screen is a relatively tight domain, so 5 interviews might be just fine. Bigger, more nuanced, and less-understood domains bump up your expected sample size. Exploring a big issue like young peoples’ opinions about healthcare coverage, a broad emotional issue like postmarital sexuality, or a poorly-understood domain for your team like mobile device use in another country can drastically increase the number of interviews you’ll want to conduct.

3. Your population’s heterogeneity

The diversity of your population is a key indicator of how many interviews you should conduct. I like to think about this in terms of personas. Lauren Gilchrist’s article here is a good explanation of how we do that at Pivotal. Your personas are basically a set of self-consistent people who represent your research population or market. If you’re researching around one persona (say, “Bill,” a 45-year-old married tax accountant from Milwaukee who hates his morning commute), you don’t need to add interviews to your baseline. But each additional persona is a multiplier — if you want to understand two groups of people, interview the baseline number of each of them.

Take this example for a product that connects politicians to their constituents:

I have one persona for my politicians and one for my constituents — they have different characteristics, wants, needs, and problems, so my learning curve for each of them will be relatively independent of the other. I double the number of interviews in my estimate and recruit from both groups.

For personas who are similar or who interact similarly with your product, you might not quite need to multiply your sample. For example, if I’m making a shopping app that men and women use similarly, I don’t need to double my sample size just because I have male and female personas. However, I’ll want to make sure to represent those groups correctly in interviews to control for selection bias.

4. The skill and experience of the interviewer and note-takers

It takes time to warm into a round of user interviews. The script isn’t quite right, the interviewers are new or rusty, the stupid Google Hangout keeps glitching — whatever. Unless interview conditions are completely solid, I need an interview or two to warm up and make adjustments. Likewise, the team usually needs some time to sync on note-taking and synthesis methods. I treat this like a learning curve in front of the diminishing-returns curve for saturation. I tack on an extra interview or three when things are still new.

Summing it up

Repeating the above, I estimate a round of user interviews like this:

  • Start with 5
  • Increase by up to 5 more depending on domain complexity
  • Multiply by the number of personas covered
  • Add one to three interviews to work out process kinks if this is new territory or a new team

This method has held up well on dozens of rounds of user interviews across several of my client projects at Pivotal Labs, but I always use it as a planning tool — not the determination of the absolute number of user interviews we conduct. This is to budget out time and money for interviews.

It goes without saying that there are a ton of other factors to consider when planning your actual interviews. Coming up with the right questions, controlling your researcher bias, pulling a reasonably representative sample, and learning to explore the extremes of your research topics are the difference between gathering the right information and the wrong information, no matter how completely. The primary goal of your research should always be to gain empathy for your customers — the better you can understand things from their perspective, the better you’ll be able to make something they find valuable.


* Ethnographic research, exploratory interviews, contextual inquiry, and usability testing are different types of qualitative research, but they all involve performing user interviews to a saturation point. I believe the type of research you’re conducting influences the size of your domain, so I account for research type using that lever.


References

Bertaux, Daniel (1981). From the life-history approach to the transformation of sociological practice. In Daniel Bertaux (Ed.),Biography and society: The life history approach in the social sciences (pp.29–45). London: Sage.

Gaskin, S. P., Griffin, A., Hauser, J. R., Katz, G. M. and Klein, R. L. 2010. Voice of the Customer. Wiley International Encyclopedia of Marketing. 5.

Guest, Greg; Bunce, Arwen & Johnson, Laura (2006). “How many interviews are enough? An experiment with data saturation and variability”. Field Methods, 18(1), 59–82.

Mason, Mark (2010). Sample Size and Saturation in PhD Studies Using Qualitative Interviews [63 paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 11(3), Art. 8, http://nbn-resolving.de/urn:nbn:de:0114-fqs100387.

Morse, Janice, M. (2000). Determining sample size. Qualitative Health Research, 10(1), 3–5.

Nielsen, Jakob. (2000). Why You Only Need to Test with 5 Users. Nielsen Norman Group. http://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/

Ritchie, Jane; Lewis, Jane & Elam, Gillian (2003). Designing and selecting samples. In Jane Ritchie & Jane Lewis (Eds.),Qualitative research practice. A guide for social science students and researchers (pp.77–108) Thousand Oaks, CA: Sage.