Benchmark Usability Testing

A baseline strategy for core user flows

6 min readMar 24, 2016

Paul, an early supporter of my product design mailing list recently reached out to ask me about the specifics of my process & strategy surrounding new features. Simply put, how did I get new feature ideas ranked, validated and built?

A fantastic question and one that led to some introspection about my validation process from past projects. What worked and didn’t work? What could I recommend to a fellow product designer to save him time? In short, the answer is to talk with your product’s target users. Then talk with them some more.

In short, the answer is to talk with your product’s target users. Then talk with them some more.

However, in an effort to give a more concrete answer with real world examples, I’m writing here about one strategy I’ve employed in the past to validate and rank the importance of new product features.

During my time consulting with Hewlett Packard in their research and development (R&D) division, my agile software team developed an ongoing strategy for our longer-term products. This strategy included a monthly usability test which sent first-time users from our target demographic through our product, end-to-end.

I would suggest that any larger product team incorporate this type of benchmarked usability testing into their process. Here, I’ll outline what to benchmark in this type of test and the nitty-gritty specifics of how our monthly benchmark test was run.

Losing Sight of Core Flows

Working within a corporate R&D department meant that our agile software team was tasked with new projects every few months. Once a successful product was formed, the project moved on to a larger, more permanent team. Then we continued on with what we did best, developing out new ideas into user-friendly products.

As new features are iteratively added to software over time, the core user flows within the application are bound to shift slightly. There are almost always unintended consequences to user flows introduced in parallel with new features.

There are almost always unintended consequences to user flows introduced in parallel with new features.

It can be easy to lose focus on the bigger picture and overall goals of a software product, as we’re often head-down on a new feature. This is natural, as so much work goes into introducing a successful feature from strategy to testing prototypes with the target audience.

Our team recognized that this was beginning to happen as higher-up managers occasionally stopped by with confusions relating to our more basic user flows while we presented smaller, singular features. This led us to conduct a large-scale usability test of our product’s core flows.

After running through this end-to-end test just once, the team realized the potential benefits and ease of recreating this test. We realized that we’d just created the first set of data which we could then benchmark against in future testing sessions with new first-time users.

By focusing the test script on validating our primary user goals and functionality of the current software as a whole, our report was a sort of “state of the software”. I was able to benchmark the state of and revalidate the success of our product over time.

What to Measure

Your goal is to record and build out a library of test results, showing the changing state of your product over time. This could be later presented to the team or company in parallel with a record of when features were introduced and how these changed the benchmark test results.

In order to conduct a repeatable test, we need to measure success and failure through evaluative questions. Be sure you’re asking these evaluative questions with a consistent rating system. For example, “Rating scale: “How difficult (1) or easy (5) was it to <complete a core task of your software>?”.

Ask your users to complete primary tasks and observe the success/failure of each. Record how the tasks are being completed, levels of confusion, questions asked and the time it takes to complete tasks. It’s also helpful to jot down the number of errors encountered along the way and the user’s understanding of these errors.

Record and store the results in spreadsheets as this format forces you to be quantitative by their very nature. It’s also a headstart into being able to parse and compare the data with future rounds of the sessions.

Work hard to get your script in a solid state for your first benckmark session. Ideally, this will be a consistent, repeatable usability session that has comparable results each time it is conducted. So, once you do have a script in place don’t tinker with it as it will compromise your results.

The Test Specifics

Our benchmark test came together as follows:

One hour, task-based sessions
3–5 first time users from our target demographic
In-person sessions in a casual conference room
Video/audio recorded on an iPad over-the-shoulder of the user
Conducted once monthly, typically on a Thursday
One moderator + two note takers

Setup Notes

We opted for the over-the-shoulder iPad camera rig so we could capture the user’s mouse on the screen as well as the voices in the room. Instruct your note-takers to remain quiet until the session’s end. At that time, they may ask questions directly to the user. Keep in mind that these questions are generally more qualitative and the responses should be left out of the official benchmark testing results.

Working out of a casual conference room as I’ve described here tends to put new users at ease. Professional testing rooms complete with one-way mirrors can make a first-time usability tester nervous and unable to share their true reflections.

We learned over time that neither Monday nor Friday worked well for us or user’s scheduling. So, try for somewhere in the middle of the week and maybe that will save you the hassle of re-learning this fact.

Depending on your project’s rate of change, you could get away with running these benchmark tests less often. I found that in our agile development environments, the once per month timeline worked well as our products were evolving quickly. Play with this timeline in your situation.

Otherwise Leveraging the Benchmark Test

Training, Awareness & Empathy

Our regularly scheduled, benchmark usability test proved to have other huge advantages within our software team. These included personnel training, company-wide user experience awareness, increased empathy for the user, management’s buy-in and more.

I tried to quickly summarize each of these findings and how they impacted our team here but quickly realized this subject should be its own complete article. So, more on that in next week’s article.

Thanks for reading and remember that the solutions to this type of problem are anything but prescriptive. Make changes and bend my suggestions so they’ll work with your team and product. This process worked well in my situation, so I hope that by sharing the process, someone else’s team may benefit as well. #sharewhatyouknow

Cheers.

Ray Sensenbach is a Product Designer, you can follow him on Twitter.

Looking for more? Join Ray’s email list for weekly writing and links for growing UI, UX & Product Designers.