VisTA: Analyzing Think-Aloud Sessions with Machine Learning

Voice and machine learning to aid user experience evaluation.

Published in

VisUMD

4 min readOct 25, 2021

What is the most popular method to identify usability problems? That would be think-aloud sessions. An international survey has found that at least 90% of UX practitioners have adopted think-aloud sessions to analyze user usability problems [2].

However, analyzing recorded sessions is often time-consuming and tedious. UX practitioners acknowledge that they may zone out during watching recorded sessions and omit issues during the first pass. However, at most times, there is no time for a second pass, nor another UX practitioner will review the same recorded session.

Is it possible to take advantage of machine intelligence in analyzing recorded user sessions? Researchers have found that when users meet usability problems, they are more likely to stop to observe, change speech rates/ pitches, and have a negative sentiment.

Therefore, researchers in the Rochester Institute of Technology trained a machine learning model to predict usability issues based on those features. i.e., speech rates, pitches, and negative verbal sentiment. They also created a video annotation tool called visTA underpinning by this model [2].

How does VisTA work?

VisTA plots a “problem timeline” for each video, where it shows spikes of detected problems.

It also has a “feature timeline,” which shows the detailed presence of such features, e.g., sentiment, high/low pitch, and low speech rate. UX practitioners can also add and export customized problem tags and descriptions at any time in the video timeline. (For more details please check the VisTA demo)

Is VisTA helpful? Yes!

To test the usability of different features of VisTA, researchers designed two versions of VisTA. One is VisTASimple, with only a problem timeline embedded, and the other is VisTA, with both problem timeline and feature timeline.

They hired 30 UX practitioners to analyze recorded think-aloud sessions of 3 different tasks. UX practitioners are randomly divided into three groups: VisTA, VisTASimple, and the control group, in which UX practitioners don’t rely on any tools to analyze recorded sessions.

Researchers found that the VisTA group identified most usability issues, followed by VisTASimple, and the control group is the least. Notably, in all three different tasks, the VisTA and VisTASimple groups have identified 30%~55% more issues than the control group, while the VisTA is slightly better than VisTASimple.

VisTA’s Role in UX practitioner Workflows

The problem timeline is proved to be the most helpful. Some UX practitioners use spikes in the problem timeline as anticipations and alerts. In contrast, some UX practitioners pay more attention to the flat timeline period if the machine learning model omits some usability issues. Those spikes also serve as anchors when UX practitioners need to revisit a part.

For the feature timeline, some UX practitioners believe knowing the features underpinning the machine learning model will help them notice usability issues undetected by machine learning models, e.g., visual cues. However, the feature timeline and the function of filtering features are underused in the VisTA study, possibly due to the learning curve of VisTA itself.

What are UX practitioners’ attitudes toward VisTA?

“Co-workers”: some UX practitioners treat the VisTA as a co-worker that may correct their confirmation bias.
“Backup”: some UX practitioners don’t view VisTA as a different perspective, but just a way to confirm their findings.
“Competitors”: some UX practitioners view VisTA as a competitor and want to identify issues that are omitted by VisTA. “I don’t feel like it’s smarter than me,” one UX practitioner said.
“For the lazy people”: some UX practitioners raise concerns that automate tools like VisTA may cause UX practitioners less diligent. “If you don’t care about your job, you will just follow the chart,” one UX practitioner said.

Limitations

UX practitioners also point out several limitations of VisTA:

It does not pinpoint the start and end of a usability problem, so it still requires UX practitioners to identify it manually.
It can not recognize nuances in users’ personalities. E.g., when some users use negative words, it may not mean negative sentiment. Besides, those features are not descriptive enough, like excitement, surprise, and frustration.
It does not consider the total steps required to complete a task, e.g., when users take more steps than needed, it could signal a usability problem. However, such situations are under the radar of VisTA.

References

McDonald, Sharon, Helen M. Edwards, and Tingting Zhao. “Exploring think-alouds in usability testing: An international survey.” IEEE Transactions on Professional Communication 55.1 (2012): 2–19
Fan, Mingming, et al. “VisTA: Integrating Machine Intelligence with Visualization to Support the Investigation of Think-Aloud Sessions.” IEEE Transactions on Visualization and Computer Graphics 26.1 (2019): 343–352.