User Testing: Doing it Better. Doing it Faster.
This article is a primer for my Front Workshop in January, 2017 on remote user testing. But I hope this content will be found valuable for all, whether attending or not.
For some of you, some of this will be review. But I’m hoping all of you will learn something new or consider old information in a new light. We all got on board the UX design or product management train for different reasons. But I’m willing to bet, for most of us, it was because we love building cool technology. We like being part of the creative process. We are, by our very nature, all of us, makers.
I got on the UX train at the stop entitled “Information Architecture”. In 2011 I had attended my first real tech conference, a little event called An Event Apart. There I learned about something called information architecture. Fast forward a month or so when the yearly A List Apart survey findings were released and I read that the job title of “Information Architect” was at the very top of the web maker’s pay scale. At that point, I was sold. I had very little idea what an Information Architect was, but I knew I was going to be one. I bought a book about IA and started down the path to UX design. As a junior web engineer / project manager, I had no credentials the day I asked my boss if I could be the Director of UX (fancy title that meant, first guy at the agency to try to do UX without any real clue how to start). Imagine my excitement when he said, “I think that’s a great idea!”. It was my second week on the job when I knew that I couldn’t do it alone. An engineer I knew well came up to me to ask how a certain interface should behave. In the course of the conversation I remember him calling me, “the expert.” I felt then, and I still feel now, that this is not a title I am worthy of. I didn’t have a good solution or answer to my friend, so my response was, “I’m not the expert. The user is the expert. Let’s try an experiment with the users.” That is still my answer.
Like you, my entry point to the product train was not at the “user testing” stop. Even though I recognized the absolute need, and became quickly reliant on it, for a long time I felt that user testing was the “I need to go in and get my annual doctor’s checkup” part of product building. It’s never convenient to plan, execute, and synthesize. And it’s certainly never convenient to deal with the all of the changes you realize you have to make afterwards.
But herein lies the problem with the way user testing is perceived. Most companies treat user testing like it’s this big thing that takes a great deal of effort to plan and coordinate. When it’s planned for it’s done with much to do and fanfare. And when it’s over, graphs and data points are drawn up. Video clips are sent out. Everyone congratulates the researcher for her fine work. And then everyone settles back in a routine until next quarter when someone sighs and says, “I guess we better put together another user test campaign.” Done like this, it’s any wonder executives might raise an eyebrow (or even two) if told that the product team wants to run user tests every week.
But I would submit that, if this is the current climate for testing at your place of employment, that it is wrong. Props for doing user testing every once in awhile, and all… but guys, we can do better. I want to talk to you about how testing can be done more efficiently, more accurately, and more often.
First, some things which will increase your efficiency. One of the best ways to speed up a test is to eliminate time wasting questions. I think it’s important to discuss what we can and what we cannot learn in a user test to help us become more streamlined. Let’s eliminate the waste — the rubbish questions that get us arbitrary information at best, and harmful information, at worst.
What We Cannot Learn in User Testing
Testing WILL NOT teach us: Whether something is pretty or not
Beauty is subjective, and therefore is not easily measured. It’s not the same as measuring quantifiable data. And yet, so many of us want to pretend we can get to the “best design” from user testing. If we ask a group of people their opinion on the aesthetics of something and average out the results, that’s exactly what we will get: average. Average experience. We will climate the “creative edges”. The risks and uniqueness that a visual designer spent hours creating. We will snuff all that out in the name of “user centered design”. Doubt me? Here’s a great talk that may help convince you.
Beauty is subjective, and therefore is not easily measured. It’s not the same as measuring quantifiable data. And yet, so many of us want to pretend we can get to the “best design” from user testing.
Now, we’ve heard for years that whenever we have a design question or debate with a team, the way to solve it is through testing. And I don’t disagree, unless we’re talking about aesthetics. We cannot — we should not be supporting our arguments about color and typography with user research. Unfortunately, we’re going to have to find other ways to handle those internal struggles. I’ll talk more about this further down the page.
The designer should be the only person who cares how thick the drop-shadow should be. Hire the right visual designer. Then trust their judgement. Don’t waste time in interviews on arbitrary design decisions that won’t get you better data in the end anyway.
Testing WILL NOT teach us: What users have done in the past, or would do in the future
Humans are terrible at self-reporting. Questions like “How many hours a week do you spend doing X?”, “How have you solved Y problem in the past?”, or “If you were trying to do x, what would you do?”, are not going to get us accurate responses. These questions are rubbish because their answers are rubbish. Don’t waste your time in user testing interviews asking self-reporting questions. I’ve heard, and even used the argument, “Well, it’s just to get an idea or a range.” There are better ways to get this data.
Don’t waste your time in user testing interviews asking self-reporting questions.
First off, the data is going to be wildly inaccurate (and you know it will be) you’re wasting everyone’s time: the participant, the product owners you’ll share the data with, and your own. Second, you can find this data in a more accurate way with a lot less effort (RJ Metrics, Fullstory, Google Analytics for goodness sake). Who are we helping by sticking these questions in our elaborate “moderator’s guide”? Is it because we want to seem thorough to our boss who will want to see our list of questions ahead of time? Throw them out, they won’t help you.
Testing WILL NOT teach us: If users will buy/use it
It’s been 45 minutes. You’ve been speaking with Martha all about how she uses your Craft-O-Matic product. She’s complimented your designs. You’ve complimented her afghan. She’s made mistakes, you’ve learned things. You have formed a researcher and participant bond. A special thing. And now you’re at the end of your list of questions and you ask, “So, now that you’ve seen what Craft-O-Mega-Matic product will do, would you buy it? How much money would you expect to pay?”
Seriously? Do you really expect your new friend Martha to tell you she isn’t really interested in it? I mean, maybe she took a vow of honesty and lost all her friends long ago because she told them what she thought of their poor mothering techniques when asked. Maybe she has no problem ruining your afternoon. But if she’s not completely pathological, she will probably let you down easy, or straight up lie about her interest in your shiny product.
A better way: Find out if they’re expending resources, energy, spending money, or otherwise attempting to solve the problem that your product solves already. If the answer is yes, chances are they would expend resources to procure your product.
Example: I once was asked to give some feedback to some students who were busy creating a smartphone application which would help smart grocery shoppers plot their pathway through their local supermarket in an efficient manner. “This app,” they told me, with great enthusiasm (naive little college kids. Oh how soon their dreams of truth, justice, and an ideal society will be crushed). ;) Kidding. Anyway. “This app, will help a potential shopper create their shopping list right here in the app. And then when they go to the store, the app will chart them an ideal course through the store.” Sounds amazing. I asked what testing they had done. And to their credit, they had done many interviews. They said that people who shop a lot told them they would love an app like that. Of course they did.
Instead, of asking if they would buy your magic thing, try to observe behavior. Find out how they are using their time as it relates to the problem you’re trying to solve now.
But let’s think about that a moment. I have three kids and I go to the store way more often then I would like. Sure, the idea of more efficiently navigating a grocery store sounds nice. But am I currently doing anything to be more efficient in my grocery store excursions? I almost never even create a list. Let alone do I expend the necessary time to type it on the little typing pad on my phone. No thanks. And even if I did, I am currently not expending any effort at all to memorize or otherwise become more efficient at the store. I start at one end and I scan my list as I cross each aisle and compare. “Let’s see, is yeast on the bean aisle or the baking aisle?” But it’s never occurred to me to even try to memorize the layout, or find any other strategy to help me be faster at this chore. Let me be clear, I love the idea of being more efficient at the store. But since I am doing nothing about being more efficient now, the likelihood of doing so with an app (let alone one I pay for) is nil.
Behavior never lies.
So there you go. Another wasted question you don’t need to ask. Instead, of asking if they would buy your magic thing, try to observe behavior. Find out how they are using their time as it relates to the problem you’re trying to solve now. I’m sure it’s a problem or you wouldn’t be there. But is it a problem that they are spending resources and energy to try to solve? Or is it a problem that they just live with and can’t be bothered to do anything about? Behavior never lies.
What You Can Learn in User Testing
Testing WILL teach us: If they trust your product
You either already know this, or you soon will, but there will always be debates about design between product owners. What color something should be, where a certain button should go, someone doesn’t like a certain layout, etc. The way to answer these questions isn’t to ask the users what they like, but what they trust. To understand if they trust it, you can ask questions such as, “When you click this button, what do you think it’s going to do?” And, “Do you trust that this product will do what it claims it will do?” or “Is there anything here that you don’t trust to do what you need it to do? What makes you doubt that it will?” These questions about trust and function quickly make user preferences clear and therefore, hopefully steer these aesthetic arguments in the right direction.
Whether a user likes the visual design doesn’t matter as much as if they trust it. Unless it’s distracting.
Keep in mind that whether a user likes the visual design doesn’t matter as much as if they trust it. Unless it’s distracting. Comments like “The color of that button is distracting and makes it hard to read,” are important comments to pay attention to. You want to ensure the design doesn’t harm the usability. But otherwise, don’t waste time in interviews talking about arbitrary aesthetics. Focus on trust.
Testing WILL teach us: If it’s usable
When I was a young, innocent, anthropology major (before my dreams of truth, justice, and an ideal society, were crushed) I learned in my anthropology of linguistics class that when cultures come in contact, even when different cultures share a common language, there are (obviously) rifts. This happens where expectations and assumptions, culture to culture, go unmet. My professor called these “rich points” and they were the focus of our studies.
User testing is much like that. Our product is one culture and our user is another. They might share a language but they certainly don’t share assumptions. Expectations will go unmet. Look for rich points. Expectations are absolutely key to everything. The very first time a user clicks a button, you lose an opportunity if you don’t know what is going through their minds. What did they think it would do before clicking it? What surprises occured on the other end? Sometimes it’s good surprises. Sometimes bad. But if we’re not hyper-focused on finding these rich points, we’ll be wasting everyone’s time.
Our product is one culture and our user is another. They might share a language but they certainly don’t share assumptions. Expectations will go unmet. Look for rich points.
Remember, it is their assumptions that matter. If a user is trying to use a feature in a way it wasn’t intended, there are really two options. You can either change the affordance and alter the experience to totally eliminate the possibility of them misunderstanding its purpose. Or, second, you can “make them right” — change the experience match with what they expected it to be.
Testing WILL teach us: Efficiency of the product
You can, and should, measure how fast users use certain features. But it might be a temptation to measure this in a user test. Now, it’s possible to do this. But in my experience it becomes hard when you’ve been training your participant to think out loud and to help you understand what you need, step by step, to then measure how fast they accomplish a task. It is much better to observe this through data you can glean elsewhere. I like using Fullstory for this. I can observe them without them having any idea that I am, after the fact. I can get the exact time stamps of when they began a task and when they finished. I can know if they navigate away from the page (thus distracted) or leave the computer. I can get a lot more accurate information and I can do it with a lot more speed if I use Fullstory than if I try to measure efficiency within a user test.
On the other hand, if you are testing the potential of a product for efficiency, you may not have any alternative to live, in person testing. Testing, for instance, a prototype of a proposed change to the software. In these instances a live interview and a stop watch are probably in order. But let’s try, when we can, to be more efficient in our efficiency measuring (that’s so meta).
Improving User Testing Quality
I’ve spoken a lot about efficiency in testing. But I want to finish on a couple important points which will help improve quality of your time spent with users.
You Need the Right Participants
It’s vital we have the right participants. It’s not just enough that we have our desired number of user study participants, that they are warm bodies that fit our general demographic description. I realize that most of you have a “target demographic” which has been well-researched. Or maybe it’s a persona you use in journey mapping that you’re trying to make sure you match as closely as possible with your participants. Or maybe, for some of you, it’s just friends or warm bodies from the street. But this is very important. It’s not enough to have just a person. In fact, it can be very harmful to your results.
But I’m going to go even further and argue that it’s not even enough to have “the target demographic”. For my company, as an example, rev.com, we create work from home opportunities for folks in the U.S. and abroad. Right now our opportunities revolve around transcription, translation, and video captioning. Our freelancers our generally, stay at home moms or folks who, for medical or other reasons, are unable to keep steady employment. It’s not hard to find a bunch of participants for my research who are stay at home moms. My wife knows a bajillion. But if in the figure below you can see that people who would actually apply for and be good at a freelance position at rev.com are a subset of people who fit “the target demographic”.
When recruiting then, let’s go a step further then just getting the right 20’s something guy who goes to the gym for our personal trainer app we’re working on. Let’s get someone who is already paying for a personal trainer and who uses a smartphone already to track their progress. Can you see how those results might be more powerful? In my case, I learned to use Craigslist very quickly to find and recruit stay-at-homers who were already searching for work from home opportunities.
Another best practice that many interviewers miss is that it’s important to start generally. You will, very likely, have very specific hot-topic features that need to be tested (you know the ones. CEO wants it to be a dropdown menu. PM wants it to be a list on the page). It’s important, though, that the user starts out guiding the conversation to what interests or distracts them. Otherwise, you are missing a very valuable opportunity. You could also run the risk of fixating the user on something they may not have cared much about previously. Subtly, you might end up skewing the results toward a data point that doesn’t serve the product’s interests.
So here are some examples of questions I like to start with, in order of how I typically use them. I don’t always ask all of these and I don’t always follow this exact order. And sometimes they just don’t apply to every test. But I try to let the user guide the initial conversation.
- What are your first reactions to this?
- Where does your eye go first?
- What is surprising?
- Do you trust X? (or, Do you trust that X will do what it claims?) Why or why not?
- Is anything distracting?
- What do you think happens when you click X?
Then you can get specific. Then you can tease out their thoughts on those hot-topic issues.
Keep them talking
When they trail off, restate the trail off.
Participant: “I just don’t understand…”
Interviewer: “You just don’t understand…”
Participant: “I just don’t understand why this button makes magic unicorns appear all over the page.”
Don’t answer their questions immediately
Once your participant begins interacting with the design, try to stay as quiet as possible, and if they ask a question, go ahead and just repeat it back to them.
Participant: “What would happen if I pressed this blinking red light?”
Interviewer: “What do you think would happen if you pressed the blinking red light?”
Participant: “The world will explode?”
Interviewer: “Why don’t you try it?”
We learn a lot more by not answering questions immediately. I’m not saying we never answer their questions. But certainly we use it as an opportunity to tease out those rich points we spoke about earlier.
Don’t be quick to come to the rescue
You shouldn’t get in the way of the user’s interaction with the interface in any way, and that means you can’t rescue them. If a user gets stuck, it might be tempting to help. But instead, explain to them that you understand it is awkward, but that it is better for the test if they struggle to figure it out alone. You will gather powerful data that way.
You shouldn’t get in the way of the user’s interaction with the interface in any way, and that means you can’t rescue them.
Just think how powerful the video footage of a user struggling to accomplish a task that may be something the CEO personally really likes. You may not have the ability to convince said CEO that his feature is confusing. After all, you are not the expert. But the CEO watching a prized customer struggle will create a much more powerful case for a change. Besides, watching a user struggle and eventually figure out the solution may teach you what the solution should have been all along.
Don’t correct the user
Remember, the user is the expert. They are never wrong. You are wrong or the software is wrong because you haven’t designed it in such a way that their expectations were met. If the user apologizes for getting something “wrong” it’s important they be reminded that the software is being tested. Remember, users being “wrong” are important rich points. When a user tells me “Oh! That’s cool. That button will add this word to my glossary” and if that button was intended to actually delete everything, I still don’t correct them. I’ve learned that it is better to wait and see if or when they figure it out for themselves. You learn a lot in watching this. Besides, if we correct them they may start to try to guess what we want them to do or say, not what they would normally do or say. They might start feeling inhibited or fearful in a way that would drastically change their natural behavior.
One other thing I have to say. You’ll recall that early on in this discourse, I mentioned that we should avoid discussions about visual design and aesthetics. However, if the user wants to talk about this, we should not correct them or brush off their suggestions. I usually nod my head and thank them for the suggestions. But I don’t really do anything with that information. Again, it’s rubbish. But just because it’s rubbish doesn’t mean we need to make the participant feel “wrong” about bringing it up.
Avoid manipulative questions
This stuff should be dead obvious. But my list is not complete without mentioning that we should never guide our participants’ responses. You have a lot of power as a researcher. We can steer a user right down a course we want them to go and we can often even get them to say what we want them to say, if we have an agenda. More importantly, we can choose what quotes we share with the stake holders and what data we present. Subconsciously, or even consciously we may also want to avoid the amount of work which might be created when our early product ideas were found to be lacking. We may be proud of the design or we may be sick of iterating on it. It’s our ethical obligation, but even more than that, our goal to improve the product that helps feed us, to present the data accurately and fairly.
It’s our ethical obligation, but even more than that, our goal to improve the product that helps feed us, to present the data accurately and fairly.
Avoid the temptation to defend the software
Never defend the software, explain that something broken will soon be fixed, or that a feature they request is coming soon. This can be hard. Especially if you have stake-holders with you who are anxious for a customer to know that they’ve already thought through a solution and that it’s on the roadmap. But if a customer suggests or requests something that is in the works, by throwing that out there, we’re effectively telling them, “We don’t need your ideas. We’ve already thought through all of this.” They may be pleased and excited to hear it’s coming. But at some level, they will start to go on autopilot. “Oh, these guys don’t need me to be creative. They just want me to click on stuff so they can see if it works right.” Maybe you can still mention it after the interview is over. But before then, the best answer is, “Oh that’s an interesting suggestion” and write it down.
Okay. We’ve covered how to be efficient and how to take our testing to a higher level of quality. Let’s talk about how to test more often.
Most designers understand the importance of user testing, but not all understand the importance of testing frequently. Frequently doesn’t mean once every three months and only for the big stuff. Frequently means throughout the process, at every turn. Ideally, you are testing every week. But there are ways to do this so it’s not such a big deal. Here are some suggestions:
Don’t spend tons of time writing a “moderator’s guide”
It’s important to have a plan. It’s not a good idea to “wing it,” as your interviews will all be different, and thus the data you gather diluted if you do. But it’s a huge time suck to build an elaborate test plan every week. Instead, have a page that has your list of general-to-specific questions (see Starting General, above) that you just always draw upon. Then just have bullets of things to have them try. It doesn’t have to take long. If your current process includes sharing your moderator’s guide with the whole team and asking for feedback and approval, it’s just not going to work to test each week. Instead of a “big test” that requires lots of eyes on a complex guide, test littler amounts of the product and more often. Don’t use documentation as an end (to impress the team with your thorough research practices), but rather a means.
Focus your test on one or two hypothesis
If your test campaign is titled “Quarterly Audit of ACME Nose Hair Trimmer App,” you’re doing it wrong. Focus the title of the campaign on the hypothesis you want to test. “Project Metatagging Design Solution r2 test.” Focus in on the smallest testable feature, if you can. I should note that it doesn’t always work that way. For instance, if the thing you’re testing is on the last page of a long employment application, you may need the user to accomplish the previous steps to have the necessary context and mind-set at the end. But maybe not. The point is, we should be focusing on “How should we design for this particular problem?” or “How do I make this particular experience flow better?”.
Use remote and on-demand technology
Not every test needs to be in-person. My favorite tool for testing, right now, is Validately. I can very quickly whip out a test campaign which can be used in remote, live testing, or on-demand, asynchronous testing. Often the thing I want feedback on is very small. And spending the time to recruit and coordinate tests with at least six participants to ask five simple questions and observe them using a small feature is just not the best use of time. Instead, I spend 15 minutes building test. I put a link to it up on our freelancer’s forum, and I request people jump on and take whenever they want. I forget about it and then 24 hours later I have my six participants’ recordings (screens and voice) and important data I would not have had otherwise. This is much more powerful than saving all of these small tests up and asking them all at once in a monster user testing campaign requiring hours and hours of effort spanning multiple pages and features.
Can I always go on-demand? No, of course not. But here’s where I value remote interviewing. Why would I spend four hours (or longer) out of my day driving to meet with users for two hours, when there is technology that allows me to speak with them from my office. I’ve had so many cancelled user tests and no shows. It’s just not a great use of time. I know I might be sacrificing a little bit of what I might learn being in the users’ natural habitat and seeing their face up close and personal. And I still do in-person every once in awhile to stay connected to them. But if I’m testing each week, there’s no way I’m going to have time to travel that much. Instead of driving, again, I use Validately. Sure, I could use Google Hangout, or Skype, or Bluejeans, or Zoom, or a bunch of others. But with Validately I can control what URL the user goes to without sending it to them in chat or email (or worse, trying to dictate the testing URL). It also allows me to assign a participant and observers. I can record the participant’s screen and face, and getting the user setup and on the call goes much more smoothly.
You should feel uncomfortable without it
Just a few weeks ago a product manager found an old design I did, which was never implemented, and said, “This is exactly what we need! Let’s get this in next sprint.” But my design was just some exploratory work. I could have been flattered. I mean, I was flattered. He loved my design! It’s exactly what the business needs! But hold up. I had not validated the solution at all. Not a single user had seen it. This made me very uncomfortable and trumped any self-gratification which I may have felt. I explained that I agreed it was an excellent idea (all of mine are), but that it needed to be tested and refined. So we tested it, and, sure enough, we did change the design (happens every time).
Point is, you need to get to the point where you are thirsting for feedback. Where you feel as uncomfortable as a cardinal at speed-dating night if someone wants to rush a solution through to engineering before testing. My favorite thing to say when someone tries to do this: “Look, this is going to get tested either way. Would you like to test it after it’s built (when changes potentially mean, quality assurance time, wasted customer time and incurred ill-will, engineering debt, etc etc)? Or would you like to test it now when it’s just a design? I almost always get the answer I want.
Well, that’s all my suggestions. Remember, meeting with users weekly doesn’t have to be a major ordeal. It can be done very simply, and it is so worth it to try. Look for my “Weekly User Testing Reminder and Suggestioner App” in the app store soon. (kidding)