Design patterns are everywhere. They are the reusable solutions to challenges found in life, implemented in different mediums like software or physical objects. Some patterns have existed for ages and can find new life today, but some problems demand pattern innovation.

A good pattern should live beyond a specific design challenge and remain useful even if a project changes its path. While spending some time designing for sound recording, three particularly interesting patterns emerged.


Keyword-image association: using audio content to automatically choose images for a sound file

We started out trying to find a way to make it easy to identify recordings or specific pieces of content within them. The problem is that with only a filename or a waveform as identifier, it can be quite difficult to find the right recording or part of it; this is an issue we see in everything from our own sound memos to podcasts. As images are far better for memory recall, and association, we auto-detect keywords and automatically add suitable pictures. When applied, this pattern makes it easy to get a sense of what a recording is about visually. Understanding the content of a podcast at a glance without having to spend 20 minutes listening, or scrubbing a recorded note to yourself to find where you left directions to a dinner you’re attending becomes a snap.

Recording a memo, keywords are used to automatically pick images
Through this, the playback view can be reimagined
Auto-generated representations for podcast chapters instead of just text


Abstracted visual transcription: realtime feedback without stealing attention

We also looked at how you could record and do speech-to-text transcription. Today, emerging speech-to-text tools aim to achieve near real-time transcription and presentation of the text. For instance, Google uses this frequently in their voice search and Google Now services. While technically impressive, the problem for people is that it can create the text equivalent of that annoying feedback you get with an echo on a phone call. When transcription is presented instantly, it is easy to be thrown off by reading what you are saying. Conversely, if you just talk into an empty space, you feel like nobody is listening to you, and the tech might not be working at all. By abstracting the words into visual placeholders we can give people real-time feedback, while not interfering with your speaking flow.

Voice transcript is hidden when recording. When paused, its visible and possible to edit


Identity-based audio overview: automatically recognize different voices in a sound file

Finally, if you have performed an interview or recorded a session with multiple people, there’s a significant hassle of quickly finding who was speaking when. The standard wave-form audio representation doesn’t really help unpack who’s speaking. Typically, the result is either a messy transcript of all different people talking, or a sound file you have to manually sift through to find the relevant stuff. Here is a pattern that automatically annotates the sound file with the people in attendance, and provides a visual guidance on who is speaking at any given time.


The great thing about patterns is their inherent re-usability. They can often be applied in other contexts than the original one. At this level of fidelity, we commonly find ways to improve and apply the patterns to problems outside the sound world. Can they be applied to TV-shows? How can they be used to improve your voice or video-call experiences?

Finding a pattern that not only adds polish to an existing product or service, but instead solves a problem, and perhaps challenges what value it might bring to the user is a key element to design innovation.