Experimental design for usability testing in the field

Shreepriya
Research Stories by Naver Labs Europe
8 min readDec 17, 2020

Usability testing gives direct input on how people use a system. In this article, we discuss aspects to consider when defining the experimental setup to conduct usability tests.

Usability testing helps to develop an empirical understanding of the usage and the user experience of user interfaces through experimental testing or qualitative observation of people using them in the wild. In both industry and academia, a “lab-based” or “controlled” setup is used for testing to gather quantitative data. For example, usability testing of a new website with participants in a controlled environment with the researcher observing and collecting data. It guarantees minimum distraction and the measurement of desired “variables”, such as completion time, with strict control over other “confounds” or variables, like noise or other elements that can unintentionally affect the results. A controlled setup provides for a strong causal claim, i.e., the introduction or manipulation of interaction design or interface features in the system led to the intended results.

Figure 1: A typical lab-based setup to conduct usability tests for web-based services.

Though this experimental setup is ideally preferred, controlled experiments are more widely used in HCI research than in industry. The costs of designing and running a rigorous experiment typically outweigh the benefits for practitioners. Moreover, it might be difficult to find an experimental translation of the phenomenon as it occurs naturally. This translation is referred to as experimental realism. The failure of experimental realism adversely affects the industry as the causes of success or failure of new interactive systems can usually be found in the broader context of activity rather than in the details, e.g., evaluating new features of a voice-activated personal assistant might require work in the comfort of a user’s home but fail in a public space such as a busy city square. Hence, field research becomes relevant as it naturally lends itself to the observation of natural behaviour ‘in the wild’, but field research can also be experimental.

Field usability testing combines techniques from traditional lab testing and field research to test a product in the actual context in which it is used. In field usability testing, participants use a product in their environment and can use their own equipment, data etc. As in lab testing, each participant performs the same tasks during field testing. While confounding variables cannot be fully controlled in a field test, they are identified and recorded to be analysed for any effect it might induce on the results. Field testing is considered appropriate for released products, prototypes and storyboards of new designs.

To illustrate field usability testing we describe our method for the prototype of RunAhead [2,3], a novel navigation system for runners, within the project named Tour Generation [4].

RunAhead: Exploring Head Scanning based Navigation for Runners

This project investigates the issue of providing appropriate navigation support to runners while exercising in unknown places. In principle, runners have the freedom just to put on their shoes and go. However, they often refrain from doing so, especially in unknown environments, because they don’t know where to go and fear getting lost. Available commercial systems provide visual or voice-based navigation which is often distracting and which disrupts the running activity.

To guide runners through running tours, we wanted an efficient, yet less disruptive navigation support compared to existing solutions. Through prior user studies, we observed that, when arriving at an intersection, runners naturally perform a head scanning movement to look at and evaluate the possible path options and their suitability for running. We decided to exploit this head movement to let the runner query for information about the direction s/he is looking at.

Figure 2: Intersection with Good and Bad Paths. Once the runner enters the circle, the Head Scanning Mechanism is activated; once s/he leaves the circle, a safety check is applied and a warning issued if the runner has taken a wrong direction.

We designed RunAhead, a system that provides navigation feedback to runners, in return to the natural head scanning movement. Our design is based on the definition of a circle around each intersection, and the mapping of the enclosing angles of the good path options onto this circle (see Figure 2). When the runner enters such a circle, the RunAhead head scanning mechanism activates, continuously monitoring the direction in which the runner is looking, or, more precisely, in which direction they have turned their head. It then compares the angle of this direction with the enclosing angles of the good path options, i.e. following the proposed itinerary or deviating on one of the alternatives, and provides navigation feedback depending on whether the runner is looking in the direction of a good path or not.

Figure 3: RunAhead device for head tracking

The system was implemented with a small compass sensor worn on the head, which sends its readings wirelessly to a mobile app running on the runner’s smartphone (see Figure 3). We complement these values with GPS information to trigger the correct feedback. We designed different feedback modes to target two types of runners, those who run listening to music and those who don’t, preferring to run immersed in the environment. For runners who exercise with music, we used a change in volume to communicate the quality of the path the runner is looking at. If the path is good, the music volume remains unchanged. Otherwise, it is set to low. For runners without music, the haptic feedback mode uses vibration as a negative signal whenever the runner looks in the wrong direction at an intersection. Thus, the absence of vibration implicitly constitutes the positive signal, given when the user looks at the good path.

We wanted to evaluate our hypothesis that the head scanning mechanism of RunAhead is effective in guiding runners through a predefined path, and RunAhead improves the running experience compared to voice turn-by-turn navigation. This required the collection of three measures: number of errors, task workload and system usability. The measures of user and system errors from each condition helped to define the effectiveness of the system. Task workload questionnaires [5], and semi-structured interviews facilitate in measuring the running experience of the participants. Also, a system usability questionnaire [6] measures the usability of the different versions or conditions of RunAhead.

Next, we defined our experimental setup for the test.

Defining the Experimental setup

Experimental testing aims to establish a strong causal relationship. Researchers naturally turn to a lab usability setting as confounding variables are easy to control. But, all experiments are context-dependent. An excellent experimental design takes into account all of the unique considerations of the study system to produce information that is both valid and relevant [1]. Hence, we used the following rationales to define if a lab or a field setup was usable for us.

How believable can experimental realism be in the lab?

To aptly test RunAhead, we required a realistic setup of a running tour containing several intersections where the participants could try the head scanning mechanism. These intersections had to be sufficiently separated to facilitate the detection of the head-scanning zone and its activation.

In a lab-based setup, experimental realism could be created through artificial and limited intersections. Some prior research has also used virtual reality (VR) to simulate the experimental conditions. However, a field setup gives us the opportunity of using ‘real’ intersections of paths suitable for running (sidewalks, parks, etc.).

How important is the situated interaction?

Participants have run on treadmills in earlier research to simulate the experience of running in the wild [7]. While this efficiently mimics the activity, it does not capture the context. Participants often have to imagine a situation or context in the setup.

The RunAhead system design leverages the natural head scanning action of a runner to evaluate possible paths ahead. In a lab setting, this head scan might be forced and judged as an unnatural interaction, creating an unpleasant experience for the participants.

How ethical is the experimental design?

Conducting good research requires experimental design to be ethical. It involves informed consent for participation, respecting the privacy and anonymity of the participant’s data and avoiding any harm to the participants. In academia, generally, an ethics board of the university or the organisation approves the study.

Running is a strenuous activity where we have to be careful in avoiding any harm to the participant. In a field setting, the safety risk depends on the location of the test e.g., a crowded street or road with heavy traffic is difficult to guard against accidental bumping or lack of adherence to traffic rules.

Figure 4: RunAhead participant running a tour, followed and videotaped by an experimenter wearing a GoPro

Considering the aforementioned, we conducted a field usability test in a public park. The variables of weather, time and crowd were identified as possible confounding variables and noted for analysis. A (very fit) researcher shadowed (Figure 4) the participant at all times, recording their run and ensuring their safety.

For the test, a baseline voice condition providing conventional turn-by-turn voice directions was added to the different feedback modes or conditions of RunAhead (music and haptic) for a successful comparison. We recruited 24 participants (11 male and 13 female), aged from 17 to 56 years who ran on a regular basis. We conducted a within-subject experiment, i.e., all participants ran in all the conditions, but the order was permuted to negate any learning bias.

Through our user tests and analysed data, we verified that the head scanning feedback mechanism was indeed effective and found that music and haptic feedback were the preferred and least intrusive feedback modes.

Conclusion

It’s imperative to understand the aim of the test to choose a proper test setup. Is the purpose to validate the entire concept, part of the concept, or specific features of a product? While some usability testing is relatively easy to conduct in a controlled environment, others require field tests depending on situated interaction and experimental realism. In general, the experimental setup should be ethical and produce a credible causal relationship.

Through outlining our process for usability testing, we hope to promote a more comprehensive understanding of the academic method amongst industry practitioners.

Project contributors:

Shreepriya (Author and editor), UX Research Scientist; Danilo Gallo, UX Research Scientist; Jutta Willamowski, Research Scientist.

References:

1. Rebecca Bevans 2020, Scribbr, accessed 17 December 2020, https://www.scribbr.com/methodology/experimental-design/

2. Gallo D., Shreepriya S. and Willamowski J. (2020) RunAhead: Exploring Head Scanning based Navigation for Runners. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. DOI:https://doi.org/10.1145/3313831.3376828

3. Shreepriya S., Willamowski J., and Gallo D. (2019) Supporting Natural Navigation for Running in Unknown Places. In Companion Publication of 2019 on Designing Interactive Systems Conference 2019 Companion (DIS ’19 Companion). Association for Computing Machinery, New York, NY, USA, 277–281. DOI:https://doi.org/10.1145/3301019.3323895

4. Willamowski J., Clinchant S., Legras C., Michel S., Shreepriya S. (2019) Running Tour Generation for Unknown Environments. In Stephanidis C. (eds) HCI International 2019 — Posters. HCII 2019. Communications in Computer and Information Science, vol 1033. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-23528-4_72

5. Hart, S. G. & Staveland, L. E., Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research, In Advances in Psychology, North-Holland, Volume 52, 1988, Pages 139–183, https://doi.org/10.1016/S0166-4115(08)62386-9.

6. Brooke, J. 1996. “SUS-A quick and dirty usability scale.” Usability evaluation in industry. CRC Press (1996), 4–7, — ISBN: 9780748404605

7. Hamdan N. A, Kosuru R. K., Corsten C., and Borchers J. (2017). Run&Tap: Investigation of On-Body Tapping for Runners. In Proceedings of the Interactive Surfaces and Spaces on ZZZ — ISS ’17, 280–286. https://doi.org/10.1145/3132272.3134140

--

--