In my not-so-spare time I work on a new technology called Hyperaudio. A question I’m frequently asked is ‘What exactly is Hyperaudio?’. Well, it can be a lot of things but I often find it useful to distill it into a sentence. I got it down to this : ‘Hyperaudio is to audio as Hypertext is to text.’ I usually pause at this point because that statement is loaded with implications.
It’s only quite recently that we’ve started to see the emergence of the audio interface as a serious tool. Apple’s Siri speech interface certainly hit the headlines when it was first rolled out and Google’s audio search continues to improve. This is driven in part by the massive uptake in smartphones and the desire to be able to use them in a hands-free way but also by the fact that audio only requires only partial attention and can convey emotion very well.
We’re still a long way from the concept of seamless audio interfaces as portrayed on the Starship Enterprise, but with the integration of HTML5 audio into the web page, things are starting to move.
So what is Hyperaudio? Hyperaudio is a series of technologies built upon the foundations of HTML5 audio which aim to make audio a first class citizen of the web. In particular, but not exclusively, we are concerned with the spoken word.
What Hyperaudio hopes to achieve:
- make audio searchable
- make audio linkable
- make audio navigable
- dynamically generate audio
- convert speech to text
- represent audio visually
When we represent spoken audio as text we immediately start to open up the content. People can see at a glance what that content is. Transcripts then, underpin much of the work undertaken under the Hyperaudio umbrella so far. As soon as we convert speech to text we are immediately able to scan, search and link to that content. Add timings and we can use that transcript as a form of navigation. We can call these hyperlinked transcripts hyper-transcripts.
So let’s look and listen to a few Hyperaudio demos to get a feel for what it can do.
- An early demo created for Denmark’s biggest radio-station DR (dual language) (note: 17/05/2017 no longer online)
- A more visual demo for WNYC’s famous RadioLab programme
- Hyperaudio Pad — a tool for manipulating media from their transcripts (Work In Progress) (note: 17/05/2017 latest version at hyperaud.io/pad/)
- Breaking Out — An experiment in dynamically generated speech
Many of these demos take advantage of the great work that is being done with the Popcorn.js library. While Popcorn.js is often associated with video it can equally be applied to audio. In short what Popcorn allows you to do is to trigger events at set times in pieces of media.
As you can see Hyperaudio has many applications, we can use it to create a range of compelling experiences, educational applications, even tools and it is not difficult to see how this technology can be applied to the medium of games.
Excitingly, as most video contains audio, much of Hyperaudio can also be applied to video. Actually the HTML5 APIs are very similar.
If this article has tweaked you interest in Hyperaudio, please feel free to join the growing Hyperaudio community.
This blog post has been written by Mark Boas
Originally published at appsfuel.com on August 30, 2012.