Human Centered Transcriptions

When generative user testing guides a project

What started as a university project for ESL (English as a Second Language) students brought me to a larger design opportunity: Re-designing transcriptions. I recognized real-time transcription AI-software, like Google’s Cloud Speech API, are more accurate and cheaper than ever and the textual support provided by transcriptions can be an answer to several issues of accessibility and efficiency.

I created some design guidelines for designing transcriptions, through a live transcription mobile app. I imagined this product in the context of lecture halls and conferences.

I started with the following high level design principles based on the context and the users: Feedback, Simplicity, and Non-Disruptiveness. As I went through the process, I added some more specific principles.

High level design principles

First Iteration

I started by looking at only what I thought was most relevant to the ESL student: translation of difficult words. The technology would detect the words outside of the 10,000 most used of the English language, present them on the screen, where the user could tap on them for definition. I created a quick interactive prototype (focused on the interaction rather than the visuals).

From being critical of this and doing user testing, I realized the design did not align with my initial goals. The lack of feedback while the screen was ‘listening’ for difficult words was puzzling to users (“is it working?”), who would lose interest in the talk, and focus on the screen. Because it showed individual words without context, and required the user to tap to get the translation, it was a disruptive experience that would disconnect them from the lecture.

Second Iteration

For this prototype, the full transcription of the lecture would appear to the user as the lecturer speaks. It prove to be much better for the users, who would simply glance at the screen when necessary. Words outside of commonly used ones would directly present a translation, in context.

The insights from user testing this prototype were much more on the detail level. The main interaction was working, but it was not necessarily pleasing.
The feedback brought to my attention the fact that transcriptions are rarely delightful. How come? Now that the technology is here, they can be more widely used, and should be redesigned to be human centered.

Third (Final) Iteration

The way words appear did not seem natural in my first iterations. User testing different pace, grouping, and motion made me find a more natural option. There were some tradeoffs, but making the words appear gradually and one syllable at the time created the most natural feel and gave a sense of immediate feedback.

I researched best typefaces for closed captions while optimizing for mobile (narrower type). When picking the typeface, I also looked at accessibility, ensuring the typeface had multi language accents and glyphs. I picked Cinecav Sans.

Comparison of 2nd Iteration (left) and 3rd Iteration (right) font.

While thinking of non-disruptiveness and how minds usually think in one language at at time, it appeared inappropriate to display translations. It would have been disruptive for the mind to go back and forth (and very possible a person does not know a word in their own language if it’s a rare one).

I explored the option of displaying definitions, but they were long and sometimes unaccessible. Synonyms ended up being my pick, as they quickly get the message across and are accessible and quite precise.

Because white comes from light on mobile, white on black is very legible — I considered it as an option, as it would also create less disrupting light coming from the phone in a dark auditorium. I kept a white and a dark grey background as options but would like to test and read more on the subject before making a final decision.

The design and the tool needed to be transparent, to follow the non-disruptiveness and the simplicity principles. I thought about how to offer more information (longer definitions or explanation for names or concepts). When tapping on a bold word, a longer definition appears.

In the end, I packaged the product specifically for students in a desktop and mobile app design proposition. Because this was my initial user, and textual support in the lecture hall is a common need for all students, including ESL ones.

Thanks for reading!

Designer @Microsoft, previously @Google, @YouTube, and @EmilyCarrU

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store