It’s Time To Define A Digital Audio Data Standard

Let’s Do This

Until recently, when you spoke into a microphone and recorded a show, virtually the only way to get it to an audience was to get someone with a broadcast license to send it out over the airwaves to people who had a radio and tuned to the specific frequency at a specific time.

Ten years ago, Apple accidentally created a new audio medium when it introduced native podcast support, so called because people could use their iPods to listen to recordings. The original podcast specs included just around a dozen fields, including headline, creation date and author. Because Apple’s the biggest distributor of spoken word audio, its technical podcast standards remain the bulwark of most audio digital distribution systems. However, over the last 10 years, hundreds of other audio distribution systems have cropped up; and with each, a new technical standard has emerged.

Audio Data’s Wobbly First Years

This stuff is pretty cool if you’re a geek like me.

For a long while, podcasts were a niche medium for techies and devoted fans who took the time to listen to a whole show. But podcasts have become big business: National Public Radio says podcast sponsorships put it in the black this year. And we’re not listening to long-form podcasts only. We’re listening to all kinds of spoken word recordings broken down into short segments just a few minutes long: Local news, commentary and even game shows like “Wait Wait Don’t Tell Me” are being consumed in short, individual segments.

These brief spoken-word segments are then distributed to dozens of different players. Rivet Radio is distributing segmented content, and so are players for individual stations, players in cars, the NPR One app, conference-call hold systems and much, much more. Anywhere you can listen to something, digital segmented audio could be delivered.

So, everything’s different. Sure, broadcast radio listeners can still hear you, but so can people listening via podcast players, proprietary digital apps for the station, streaming websites or even Rivet Radio’s app, the “smart audio” app reinventing radio. And if you’d like your audio to be playable on dozens or even hundreds of apps and websites, you’ll need to tag it with metadata and associate it it with an XML feed.

While you may prefer to avoid this sort of technical mumbo-jumbo, the price of ignorance can be reduced listenership.

Right now the data for all this great content is delivered on top of Apple’s original podcast format–with a Balkanized patchwork of different standards layered on top to serve the many needs audio distributors and producers have discovered since the launch of the original 2005 standard. It works, but not well. It’s getting harder to move content from one system to another, a fact that makes the distribution of content to multiple platforms cumbersome, an outcome that doesn’t serve anyone in the digital audio business well.

Striding Forward

We need a new spoken-word audio standard for metadata describing the audio, the XML packaging the audio and the file format for delivering the audio.

Rivet’s developed an internal standard, which we will be sharing with the world. We recognize that our standard probably doesn’t serve everyone. So, we’d like to open a discussion on best practices and form a working group on a new spoken-word data standard — something we’re calling the Audio Digital Format.

To kick off the discussion, we’re sponsoring a meeting at the Online News Association conference in Los Angeles this September. If you’re interested in digital audio, we’d like you to join us and play a part in setting this new standard.