Measuring Subjectivity in Interface Design: A Quantitative Approach

Assessing cognitive appraisals of interface designs

Andrew Pennell
ThinkingMatter

--

Doesn’t this look kind of cool. Let’s just call it art.

If you’re not already familiar with it, a desirability study (also known as the Microsoft Reaction Card Method) introduces subjects to a stimulus (an interface) and then offers them a variety of different descriptors (one-word adjectives) to apply to what they saw. The original intent of the study was to “develop a practitioner’s methodology suited to the challenges of measuring intangible emotional responses.” There are variations where users may interact with the design and others where they don’t, however, interactivity introduces another variable into your assessment when drawing your conclusions.

The Importance of Visual Appeal and Brand

The Aesthetic-Usability Effect implies that “users are more tolerant of minor usability issues when they find an interface visually appealing.” Simultaneously, consumers make automatic judgments about a brand based on visual cues about trustworthiness, relevance, and value. Knowing this, we set off to understand the opinions users had regarding the aesthetic dimensions of our website.

Methodology

This being a practical study, I had to deal with several real-world constraints — time, budget, and scope. We were originally alotted a budget for a total of 420 participants which I further broke down into groups of 105 per condition. Each condition consisted of an initial stimulus, in this case, a full-page screenshot of a website, which was followed by 6 multiple-choice questions, each containing 9 items. These questions were subsequently followed with bipolar Likert-style questions.

For the multiple-choice questions, I broke out each possible answer as a dimension on its own and encoded either a 1 or 0 for whether or not each respondent selected each answer. I then ran Chi-Square tests on each dimension to test for differences between groups after testing for equal variances. Each group was exported into SPSS where I ran descriptive statistics. I also compared the results of the two groups by using a few different analyses.

For the Likert-style items, I used two analyses, a one-way ANOVA and the Mann-Whitney U test to measure the differences between groups. I kept my p-value at .05 but considered coefficients that stayed under .1 as well for practical purposes. For items that were found to be significant, I found Cohen’s D to measure effect size.

Note: At n=105, small differences (<20%) in the means were not found to be statistically significant. Larger differences (>30%) were found to be significant). I should also note that this is more important for practical purposes, that is, we have no interest in determining if users consider us to be 5% more trustworthy or 5% more valuable. This can be a consideration for larger studies with larger budgets. I would also like to note that when I refer to differences I mean real differences within the population, not the sample.

I also developed a correlation grid with all dimensions running the top row and again down the first column to see which dimensions were frequently selected together. This was useful, in that for one design we found that the respondents that found the website to be “dull” were more likely to also find it “not-valuable” (r=.53). Similarly, users chose “organized” and straightforward” more often than not (r=.54).

Correlation grid, with conditional formatting to detect large correlation coefficients

I should also note that a high correlation does not necessitate conclusions about the dimensions relevance, importance, or agreement between respondents; the dimension may not have been important to users in the first place, with only a few respondents selecting it as an answer. For instance, “confusing” was only selected by 22% of respondents while incomprehensible was selected by 16% yet had a large correlation coefficient of.55. What the grid helps illustrate is the relationship between variables, not the amount of agreement on a particular dimension across the entire sample.

From this correlation analysis we were able to produce a better visualization of related adjectives:

Visual representation of relationships among adjectives

Conclusions on this methodology

This was an incredibly useful method for cheaply and quickly determining the differences in evaluation between our website and our competitors. It was also useful in helping us determine differences according to gender and age while simultaneously understanding when variables were related to each other. This ultimately has helped us generate hypotheses like “Can we influence perceived value by changing our design so that it is no longer considered ‘dull?” Similarly, if we make our design more comprehensible, should we expect attitudes around how“boring” it is to change?

--

--

Andrew Pennell
ThinkingMatter

Trying to figure out the world of cognition, philosophy, and experience.