Designing Audiobit

A Journey in Research and Design

Kevin Ma
Kevin Ma
Nov 8, 2017 · Unlisted

The Problem


I enjoy listening to audio segments on my phone, but I can’t interact with, or take notes on a audiobook the same way that I can do with a physical book.

My hypothesis problem seemed clean and reasonable, but that was problematic in itself. I was worried that it was the result of optimistic but misguided solutionism, or: making an app because I could, not because I should.

As it turned out, I had a different starting point after I interviewed audiobook listeners to identify the main issues in the audiobook experience. As expected, some people never take notes when they read, much less when they listen to audiobooks. As I asked more people about their habits, I found the note-taking distinction comes largely from intent. People are motivated to listen to audiobooks (and read books) for two main purposes:

  • to be entertained
  • to gain understanding

People who listen for understanding are much more likely to go back and re-listen. Some people talked about how they had repeat parts because their attention was divided during the first listen through. They pointed out how difficult it was to find a past segment without having to scrub through the entire audio.

Revised Problem Statement

I’m listening to an audiobook in order to learn. I can not engage with the material in the same way as a physical book however. This makes it difficult to revisit the material to supplement my understanding.

I refocused my lens on people trying to learn from audiobooks, whether for school, work, personal development, or interest. While this is a nebulous group to define clearly, I would include students, teachers, academics, and other professionals whose work requires extensive reading.


Instead of focusing on how to enhance existing tools, I wanted to get a better understanding of the relationship between “interaction” and learning, what that kind of interaction looked like, and what it meant to people today.

I defined an artifact of learning to be any any alteration to the source material, or any new object created. My research goal was to abstract the physical act of creating artifacts towards the impulses that led people to do so.

I selected 4 participants, and gave them cards to carry around with them throughout the day. I asked them to jot down the following on different days, whenever an event took place:

  • Day A: “Creating an artifact” How are you doing this (marking the page, making external notes, etc.) Why are you doing this?
  • Day B: “Revisiting an artifact” In what context are you revisiting this? Why are you revisiting this? What was the outcome of revisiting this?

My participants returned the cards to me after two weeks. I compiled their responses and looked for ways to group them.

I was surprised by a participant telling me he used his phone to take pictures of things he deemed most important . He might forget about notes, but he would come to the pictures on his phone sooner or later. Similar passive reminders cropped up in both digital and analog formats — stickies, images saved to the desktop, emails to self, all characterized by a shared tendency to appear at places where people knew they would be at some point in the future, regardless of when that point would be.

The motivations for creation fell into the following categories:

  • I want to remember. Memorization was the most common theme.
  • The idea is significant. An idea could be important within the context it presented itself in, or important to the user personally.
  • I want to share this. The artifact contains value for someone else
  • I don’t understand. Artifacts are both tools to aid problem solving, and indicators that something is difficult.

What did people actually do with artifacts? I grouped the uses for artifacts under broad categories:

  • Consume once
  • Reuse
  • Send
  • Do nothing

I observed a sweeping divide between artifacts that were useful only once (writing down the definition of a new vocabulary word, repeating something out loud), and artifacts that had repeated value (a set of commonly used commands for the Terminal).

Another distinction was between artifacts useful within the process in which they were created, and artifacts that were only useful after that process had finished. One participant mentioned that he didn’t revisit his artifacts. For him, creating the artifact was the only thing that mattered, because it kept him focused on the task; it formed the bulk of the learning for him.

I synthesized my findings into design principles. I hoped that these would influence my future design decisions, and keep me aligned with higher level goals when I was in the trenches. I wanted to build a product that would:

I wanted to build a product that would:

  1. Encourage active engagement while listening
  2. Allow for fast revisiting during, and after listening
  3. Map peoples’ understanding.

Early in the brainstorming process, several analogies jumped out to me: bookmarking, pins and highlighting. With these, I started exploring what the ideal engagement experience would look like.


I was excited at the initial possibilities and admittedly, I spent a lot of time thinking about lofty ideas of questionable value. I eagerly solidified these earlier ideas with mockups, but with each mockup, I was moving away from a high level concept.

I took a step back and focused on creating a flow for engagement during listening, the most important interaction of the app. By honing in on this process, I hoped to build a backbone for the rest of the system, before I got into additional details like the process of finding a book, pages for books, and what users could do with their engagements. Those details, while important, could come later.

I determined that the following views would be necessary:

  • Audio player
  • Notes list
  • Chapter list

Here, the bookmarking button sits prominently on top of the cover art, and is the primary action during listening. A persistent bottom bar allows access to the book information and chapter list at all times.

An interactive prototyped illuminated an interesting issue. When prompted to edit a bookmark in the bookmark view, a tester asked if she had to pause the audio. A simple question, but one that dives straight into the heart of the product. She was not just asking for clarification on a feature, but for clarification on the relationship between listening and revisiting.

I envisioned a number of similar but distinct use cases for the app. Engaging casually while listening, and revisiting after a finished listening session. Engaging and immediately revisiting. If I wanted users to be able to revisit during listening though, I would have to bring the player and bookmark views closer.

Wireframes would only take me so far. Interactive prototypes allowed me to get a better sense of how the app would feel in the actual use context. I changed to a paged format that scrolled between Chapters — Audio — Bookmarks, and decoupled the audio controls from the view so that they were always visible. I hoped that this would:

  1. Aid the idea that the views are different faces of the same object — the book that’s open
  2. Allow the user to edit notes while maintaining control of the listening experience.

I was afraid of using an unconventional format for an audio player, but user response turned out to be very positive.

Bookmark Cells

If a book is a trail, an audiobook is a stream. It drifts forward regardless of what you’re doing. By the time you’ve decided that something is significant, you’ve already passed it. Though I initially thought of multiple engagement engagement types based on different user motivations, a single action was better suited to the audio format.

Bookmarks are pins for chunks of time. A lot of actions are required of a single bookmark: playing, editing the audio duration, moving the audio to the bookmark location, adding a note, deleting the bookmark.

I prioritized the actions in terms of how they related to the problem of listening for understanding. I determined that playing would be the most important; a user will want to repeat a segment to reiterate an important point, or understand something they may have missed. Next, came adding a note. This was the personalization aspect that allowed users to distinguish between notes if the need arose. Based around this information hierarchy, the order of which would change as time went on, I proceeded to design and test different cell actions.

Initially, I intended for a single click to expose all of the functions, including all the editing options. The problem that arises from this is an overcrowded window with functions that relate to various aspects of the bookmark — editing text, editing audio, and changing audio player position to the current bookmark.

I didn’t want to overload the cell with actions, so I chose to use a more-options modal.

The bookmark editor covers the screen, against the primary color of the cover art. I like this because it isolates the bookmark from the rest of the audio player so that there is no confusion between multiple audio controls. I decided to incorporate note taking into the title of the bookmark, because the default “New Bookmark” provides no useful information.

Doubling Down on Previews

Previews address the question of aiding revisiting for understanding, from within the audio player. They’re the equivalent of consuming once.

What’s in an effective preview? Should previews be be condensed or expanded versions of regular bookmark cells? Should they take on new format, one more suited for quick listening access? I had lots of questions and went through dozens of iterations to get towards something non-obtrusive, quick and easy.

Some more complicated interactions were appealing to me, such as allowing the user to write notes directly into the preview, but testing proved these to be too complex. I realized that with most of their attention focused on the content of the audio, users wouldn’t have much energy to devote to the previews.

Bookmarking on the go is fast, and previewing should be the same.

Previews can be played or edited by clicking on the preview body. After a brief time interval, they are condensed into a bubble displaying the current number of bookmarks. Standard audiobook features, such as playback speed, the download option, are used much less, and are placed in a options menu.

Missed Connections

Sometimes, ideas that you’re excited about get left on the cutting board. This is one of them. I had thought of the possibility of incorporating the text transcript into the listening experience. The user could switch contexts when appropriate, and as an added benefit, the bookmarks would come with snippets of text.

I scrapped the idea because most audiobooks rarely come with synced text transcriptions. Even if the audio were to be synced on a sentence basis, there’s too large a margin of error for text snippets to be accurate. While I think media will move towards more assimilated formats in the future, I needed to keep the constraints of existing technology in mind.


As I build out the rest of the application, here are some of the bigger takeaways from working on the project, that I hope to take with me to future projects.

Context is Key

What works in theory is often very different from what works on a phone in real life. This is especially true with audio. The user will likely not be looking at their phone. If they are holding their phone, which they probably aren’t, they’ll be holding it with a single hand and using the thumb for gestures. It wasn’t until I saw people listening to audiobooks in real life that I realized that some of my ideas needed to be revised.

Product Design before Visual Design

I enjoy exploring design directions in depth, and get caught up thinking about interaction and visuals. This results in me working off of shaky foundations. Many times I had to force myself to step back, question whether what I was doing was supporting design goals, and stop if what I was doing didn’t have any merit.

Working on Audiobit has been a great learning experience, and I’m excited to take it further. I’m interested in the social component — how people get into conversations about audiobooks (like how people talk in between the margins of a real book!)

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store