Exploring the tension between transparency and user experience
By Jennifer Stark and Nicholas Diakopoulos
Ranking algorithms decide how content is presented, sorting everything from search results and dating matches, to job applicants and news feeds. Algorithm-curated feeds, as used by Facebook and online news media, are receiving critical attention in the wake of the 2016 presidential election because they can promote false or misleading material. It is becoming evident that transparency about how such feeds, and other rankings, are curated is paramount to gaining and maintaining trust in the information presented.
But achieving this is not straightforward. While transparency can increase trust in an algorithm’s output, one study demonstrated that too much transparency can negate this effect. Students receiving peer-reviewed coursework scores were assigned into three groups, with varying levels of transparency around the peer review process. The low transparency group received no additional information, the medium group received a brief explanation, and the high transparency group also saw raw scores and the algorithmic adjustments accounting for the peer review process. If the student’s grade was lower than expected, medium transparency improved trust in the grading system, but high transparency did not increase trust. Transparency also had no effect on trust when scores were as the student expected. As well as the level of transparency, there is also a concern, as yet untested, that adding a surfeit of transparency features or information to a system will interfere with usability.
To investigate whether this tension exists, and if it does, to quantify its effect, we created a Web tool that visualizes ranks of programming languages — a topic of keen interest to our collaborating publisher, the Institute of Electrical and Electronics Engineers (IEEE) Spectrum, which publishes content for its professional audience of technically savvy engineers. The tool presents a dynamic ranking according to 12 different weighted measures of use or popularity of each language (e.g. how many search results or job postings there are for the language). The tool was published by IEEE, and attracted users with domain knowledge in programming. Users who visited with the tool were then invited to our opt-in survey.
There were four transparency features incorporated directly into the interface that allowed the user to interact with how the ranking is produced. These included: (1) selecting different pre-set weighting combinations, (2) directly editing weighting or inclusion/exclusion of various data sources to create a custom ranking (e.g. see figure below), (3) visually comparing two rankings with potentially different weightings, and (4) filtering the types of languages that are shown. These features were motivated as a way to enable algorithmic transparency into how the ranking was synthesized and as an entry-point for users to interactively express disagreement with the defaults.
We analyzed survey results from 204 individuals (148 after data cleaning) who voluntarily visited the web tool and voluntarily completed the survey. Data cleaning included standardizing birth dates, removing rows with incorrect inputs like “Brazil” for year of birth, more than one unanswered question, or inconsistent answers such as rating 4 or 5 for “Interesting” and 4 or 5 for “Boring.” The goal was to mitigate the impact of sloppy user responses that might reflect a user not paying close attention to the survey.
For each transparency feature, we asked, “Did you interact with X?” to which they could answer “Used,” “Did not use,” or “Did not know it was available” (Table 1). Participants also provided ratings for trust- and user experience (UX)-related words on a Likert scale from 1 (e.g. not at all trustworthy, or not at all easy to use) to 5 (very trustworthy, or very easy to use) for a list of seven trust-related and nine UX-related terms when they were finished with the web tool (Table 2). With these responses, we could then test how transparency affected ratings of trust and UX. For example, for participants who used all transparency features, was trust increased? Was UX decreased?
We also collected information related to the participants, including years of programming experience. We found that years of experience correlated negatively with ratings for “Trustworthy” and “Objective” trust-related words, and the UX-related word “Satisfying.” In other words, people with more domain-specific experience were also more skeptical or uncertain about the information presented, and also had a poorer experience (in terms of satisfaction). This may prove important considering the goal of transparency is to increase trust: It may be that people with more expertise would prefer additional levels of transparency, but additional research would be required to test this idea. The effect on trust appears to be quite robust, as when the trust and UX-related term ratings are averaged together (see below for explanation), correlation with years experience remains for trust terms. The correlation disappears for averaged UX-term ratings, though, so it is possible that the negative relationship between years of experience and ratings for the term “Satisfying” for the tool is spurious.
Many of the trust and UX-related terms may be considered to be similar, and therefore averaging those ratings simplifies our statistics. Factor analysis and Cronbach’s Alpha calculations showed that ratings of trust terms and for the UX terms did exhibit high correspondence, meaning that all terms were very similar to each other. Therefore, rather than performing statistics on each of them separately, the responses for the trust and UX terms were averaged together giving us just one score for trust and one score for UX per participant.
We then tested for differences in trust and UX ratings between the three levels of interaction (“Used,” “Didn’t use,” “Didn’t know it was available”) with each of the four transparency features. We found that ratings for trust or UX are not dependent upon level of interaction of any of the four transparency features.
We then considered that the level of interaction could be framed in one of two ways: 1) whether a transparency feature is known about (“Used” and “Didn’t use”) vs. not known about (“Didn’t know it was available”), or 2) whether a transparency feature is used (“Used”) vs. not used (“Didn’t use” and “Didn’t Know it was available”). Indeed, we found when we consider only whether a transparency feature was used or not, the “Compare Two Rankings” feature significantly increases both trust in the information and UX. In addition, when we consider only whether a feature was known about, knowing the “Edit Ranking Weights or Data Sources” feature exists significantly enhances UX.
This preliminary analysis suggests providing transparency features can increase trust in information as well as enhance UX, and this approach may be effective regardless of whether the user interacts with the transparency feature or not. Our findings also suggest providing these affordances does not necessarily negatively impact UX. Future work should expand to more participants and center around a tool that does not limit participants to a specific domain. Future experiments might also be more structured, where participants are overtly required to interact or not with specific parts of the tool to avoid self-selection of respondents to the survey.
Jennifer is a Research Scientist in the Computational Journalism Lab headed by Tow Fellow Nicholas Diakopoulos at the University of Maryland, College Park, Merrill College of Journalism.