Using list experiments to research sensitive topics

When you can’t just ask someone “are you a racist? Y/N”

Chris Liu
Bootcamp
7 min readFeb 9, 2023

--

Unhappy woman covering face with hand
(Photo by Keira Burton from Pexels)

List experiments are a way to understand how many users have a belief or engage in behaviour that users don’t want to admit to (aka when you’re facing social desirability bias). That’s a pretty specific use case. But I think it’s a super cool yet accessible reseach method that’s not commonly seen in UX. When I search “list experiment” on Medium I only see articles that are actual lists of experiments that the author has done, is doing, or wants you to do.

I’ll give an overview of this method and then instructions on how to do one. I’ll use number-y words at times, but please don’t let any of this intimidate you if you’re a qualitative researcher—I’m writing this primarily for you and I promise that this method is straightforward enough that you don’t need a statistical background to do it.

I think of the list experiment as one tool in my methodological toolbox, and UX researchers thrive when we have many tools to draw on for the many different problems we encounter. My hope is that after reading this, if you do encounter a situation where you expect to run up against social desirability bias, you’ll have at least one tool you can use to tackle it.

What are list experiments?

Illustration of person looking at two screens side-by-side
(Undraw.co)

To understand list experiments, let’s first talk briefly about experiments in general (or A/B tests in tech lingo).

Say you want to understand whether checkout flow A or B leads to more sales. You could show your users flow A for a while, then switch over to B, then compare results. But then you couldn’t rule out the possibility that those folks who saw A were different than those folks who saw B and that those differences in users is what drove any differences in sales. Maybe users who saw A had higher income, or users who saw B were looking to buy different products. Those are perhaps things you can control for in a regression model, but can you control for every possible factor that can affect sales?

Not via regression — but you can with a randomized experiment. By randomly assigning users to see A or B, you create two groups of people that are statistically identical (the groups are equal in expectation). Importantly, the groups are equal (in expectation) not only on things like income, but on all possible factors and attributes. The only difference between A and B users is what checkout flow they saw, which means you can attribute any difference in sales to the checkout flow itself.

This is why randomized experiments are considered the “gold standard for causal inference.” Their magic and power lies entirely in randomization.

List experiments build on that idea. In both A and B groups of a list experiment, users are shown a list of beliefs or behaviours, and they are asked how many (not which ones) of those beliefs or behaviours they believe or have done. Users in A and B see the exact same list, with one crucial difference: The B list also contains the belief or behaviour that you actually want to ask about.

The logic of the list experiment is this: If no one in your sample believe or did the thing you’re asking about, the average number of list items users check off would be identical in both groups because no one in B would have checked off the additional item. And because users are asked only to say how many items are corresponding, not which items, users are more likely to agree to a socially undesirable item. Therefore, you can use any difference you see between A or B to estimate the prevalence of the belief or behaviour you’re studying among your sample.

The most well-known example from my own discipline is Kuklinski et al. 1997. In that study, the researchers wanted to study negative attitudes towards Black people among whites in the American South. Half of the white Southerners in the study — randomly selected — were asked how many of the following made them angry:

  1. The federal government increasing the tax on gasoline
  2. Professional athletes getting million-dollar contracts
  3. Large corporations polluting the environment

The other half saw the exact same list, with one additional item:

4. A black family moving in next door

By subtracting the mean number of items that made people angry in the first group (1.95 items) from the mean in the second group (2.37 items), the researchers estimate that 42 per cent of White Southerners are angered by the thought of a black family moving in next door — pretty shocking evidence for the prevalence and endurance of racial prejudice.

Step-by-step instructions

Note: I include here only what’s specific to list experiments and only the bare minimum to get you along your way. Important topics like sampling are things I leave for another time. There are references at the end if you want to build on what’s here and learn more the ins-and-outs.

Person checking off items on a list in a notebook
(Photo by Glenn Carstens-Peters on Unsplash)

One: Scope and plan your research

The starting point for any research project. Understand the problem, what your stakeholders want to know about users, and why they want to know it. In particular, is the belief or behaviour we want to capture something where we expect at least some users to not want to admit to? Is there past research that suggests this is the case? And if the study did return an estimate, what would our team be able to accomplish with that information that we wouldn’t be able to do otherwise?

Two: Create your survey or interview protocol

List experiments are most often built into survey questionnaires. Your survey platform will need to allow for randomization. In Qualtrics for example, use the Randomizer element in the survey flow to show the versions of your lists to randomly selected groups of users. You can accomplish the same in Alchemer through Percent Branching.

There’s no reason why a list experiment can’t be done in a more manual fashion, however, say as part of a series of user interviews. If you go this route, remember that larger-than-typical interview sample sizes will be better for performing tests for statistical significance between your two groups. The most important thing to do is to randomly assign your interviewees and to do so in a way that you can defend. Use a random number generator like this one to make a set of randomized numbers, match your interviewees to that set, and use the corresponding variant during the interview.

When designing your list of items, create items that are plausible and relatively neutral, and avoid having more than three or four items. The goal is for the user to be reassured that they can voice a socially undesirable belief or behaviour without the researcher knowing it for sure, and to avoid them having to do too much counting or math on the spot.

Remember that the response your collecting is not which items the user agrees with, but how many.

Three: Analyze the results

Whether you’ve gone the survey or the interview route, after you’ve collected the data you should have a set of individual responses, where each response is the number of items that person selected. Calculate the mean items for the group of users who didn’t receive the item of interest, and calculate the mean for the other group who did.

Lastly, perform a test to see if the difference between the two groups is statistically significant (i.e. how confident you can be in rejecting the idea that any difference you see is due to random variation). This part might seem daunting now if you are a purely qualitative researcher, but I assure you it’s not very difficult to do and you don’t need statistical analysis software or programming skills to do it.

The first step is to choose the appropriate test for your data; most likely you’ll want to use what’s called an unpaired two-sample t-test, but GraphPad has a guide that clearly walks you through what your options are and what they mean. The second step is to perform the test itself, and again GraphPad has an online calculator that you can use, though you can find many calculators out there.

Then you’re done!

To recap: When we want to study behaviours or attitudes that users don’t want to admit to, social desirability bias can make it difficult to measure how many users actually exhibit that attitude or behaviour. List experiments is one method for making those measurements and a valuable tool to have in your UXR methodological toolkit.

Further reading

For some advanced techniques when working with list experiment data:

Blair, Graeme and Kosuke Imai. 2012. “Statistical Analysis of List Experiments.” Political Analysis 20: 47–77.

Glynn, Adam. 2013. “What Can We Learn with Statistical Truth Serum? Design and Analysis of the List Experiment.Public Opinion Quarterly 77: 159–172.

For ideas on correcting list experiment failure when working with marginalized populations:

Kramon, Eric and Keith Weghorst. 2019. “(Mis)Measuring Sensitive Attitudes with the List Experiment: Solutions to list Experiment Breakdown in Kenya.Public Opinion Quarterly 83 (1): 236–263.

--

--

Chris Liu
Bootcamp

User Researcher | Social Science PhD | Mixed-Methods Advocate | Gamer | Lifelong Dilettante