Gretta on making a social media interface for podcasts.

Published in

A Matter-Driven Narrative

6 min readJun 15, 2017

Hi there! I’m Kim, Chief at Gretta.com for podcasts, started in Vancouver, Canada way back in the summer of 2015 with my cofounders Kelly and Rainer. Our mission: to bring podcasts in from the cold—to finally make them socially shareable.

Why? While podcasting’s popularity continues to skyrocket (there will be 43 million podcast listeners in the United States alone by 2020, with an estimated market size of more than half a billion dollars), pretty much everyone seems to agree that discoverability of podcasts is broken. Unlocking the social sharing and discovery loop for podcasts is central to growing the medium, maybe even to fundamentally disrupting the space.

But even way back then, we could also see there might be something even bigger over the horizon. More and more, we’re becoming used to speaking to our devices. Voice interfaces are being adopted at an impressive rate. And at the end of the day, most podcast content is spoken-word audio. What if we could get really good at making podcast audio—voice—social? Today, being “social” on the internet mostly means typing at each other. What if a future existed where “being social on the internet” meant talking?

We have to admit; we didn’t get off to an especially auspicious start. Coming into the summer of 2016, after most of a year working on a social podcast-creation platform, we hit a dead end (read the story here). We knew we couldn’t afford to make the mistake of building the wrong thing (again). So we began a series of small releases aimed at failing fast.

This is when one of my former co-workers, the legendary Jim Pick, joined us. With his help, we quickly spun up an experiment around podcast transcription — kinetic transcript player 1.0. This looked encouraging. Richard Campbell, entrepreneur, celebrated technologist, and one of the fathers of podcasting, came on as our first investor, giving us a huge back-catalogue of content to work with. We began to test hypotheses rapidly.

An excerpt from Gimlet’s Reply All #52 — Raising the Bar

The original: “Do ‘kinetic transcripts’ improve the podcast consumption experience?” Transcript Player 1.0 (from Reply All #52 — Raising the Bar)
“Can we programatically make transcripted voice-clips nice to consume?” Audio Vizualizer with Transcripts

We started to play with Automatic Speech Recognition (ASR) transcription.

“ASR transcripts aren’t perfect: is it feasible to get humans to correct them?” Transcript Editor 1.0, starting with machine-learning-generated transcripts.
“Can we get as many hits from visualizations on YouTube as we do from live-action video?” YouTube Search-Engine Test (.Net Rocks! #1298—GMO, BT and Glyphosate GeekOut)
“Is the amount of labour involved in creating an ‘audio newsletter’ worth it?” Audio Newsletter Transcript (feat. Breakmaster Cylinder)

By now, it was obvious that our “kinetic transcripts” were evocative. You could remember a phrase you heard on a podcast a few weeks ago, type it into search, and be linked straight to it. The internet is made of links. Content that cannot be linked doesn’t “exist” on the internet. With these transcripts, podcast audio could now be deep-linked—the key to social sharing, and discovery.

We also sort of knew that vanilla machine-learning generated transcripts wouldn’t cut it unedited (introducing some questions around scale), but it felt promising.

More experiments…

“Can we avoid the stores?” Progressive Web Apps (various .NET Rocks episodes)
“How long is it ‘fun’ to just sit and stare at these things?” Video Visualizer Roulette (Hanselminutes)
“Is it feasible to get human beings to correct ASR transcripts 2: Electric Boogaloo” Transcript Editor 2.0 (bonus release video. Stay for the out-takes.)

This one was interesting. Fellow Olds may remember Katamari Damacy for Playstation 2, in which players were invited to roll a ball around the world to pick up random junk. Getting to a spotless floor was incredibly satisfying.

Because the automated transcripts were pretty good (but not perfect), cleaning up the transcripts felt a little like playing Katamari. It was actually, well…fun. Looking at the flawless transcripts afterwards felt great. Hm.

NA-NAAAAAA-NA-NA-NA-NA-NA-NA-NAAAAA Katamari Damacy.

More…

“Which ASR ‘recipe’ do humans prefer?” Robots Round Robin (simple web game with excerpts from Cool Games Inc., 2 Dope Queens)
“Can we retain users by embedding in a podcast website?” Human transcription embed (Hanselminutes)
“Will users share kinetic transcripts?” Transcript Player 2.0, featuring transcript-enabled audiograms
“Does production-value of shareable videos affect engagement?” Better-looking audiograms
Even more coming soon…

On the business side, things started to go right… along with Richard, we were joined by The Good Doctors—former bosses from my days at Bioware—Ray Muzyka and Greg Zeschuk. They brought with them the experience of having built an $860M entertainment software company. Our first investors outside Canada were David Grampa and Javier Soto, who kicked our asses on roadmap (for which we were very thankful). We made it to the final stage at the Seattle Angel Conference, hosted by the lovely John Sechrest. And Cam Cavers, Jim’s ex-business partner and Gretta’s former landlord, came in as our UX engineer.

We started to form partnerships with podcasts. Richard’s .NET Rocks and RunAs Radio. Hanselminutes took a chance on us. Media Indigena. The Lapse. Beards, Cats, and Indie Game Audio. Game/Life Balance. Software Engineering Daily. All told, we had access to more than 600,000 listeners as we continued to run experiments.

Then we were invited to join visionary accelerator Matter VC, an opportunity for which we’re profoundly grateful.

With that, you’re basically current to where we are. We’ve built a lot, but we have a lot more to build, test, rebuild, and test again.

In the near-term, we want to form more partnerships with podcasts. There’s a huge amount of work to do on our machine-learning transcription pipeline, and on our player UX. We have to recruit more developer, designer, and data science talent to do that. Although we have a pretty good idea of where to go with this, we still need to prove a revenue model (or models).

And it’s going to require resources to get there! We’re extending our pre-seed so we can hire an extremely experienced C-level resource (and because San Francisco is expensive). If you’re interested in joining us as an investor, happy to chat.

There’s also the larger vision. The more we build, the more we realize that what we’re doing is hard…really hard. We also know that we’re onto something. Here’s where we think we’re headed:

Folks go nuts over the prototype player interface. It’s a sea-change improvement in navigation, deep-linking, SEO, and sharing over traditional consumption. And hey, that spells network effects! Better discovery for listeners, better adoption for podcasters. The more we polish the UX to support everyday podcast consumption, the more true that will be.
Correction is sort of fun; we can make it funner. By feeding human corrections back into our speech-to-text corpora, we can improve our transcriptions, getting even better results for podcasters. And because we’re dealing with power-laws—we’re only trying to solve for one podcast at a time, not all of human speech—we might even be able to get there faster than some of the bigger players.
By doubling-down on machine learning, we can start to link podcasts up in a social graph. Imagine being able to follow your favourite hosts, no matter which podcast they’re on. And that’s just the beginning. Once voice audio and transcript are fungible, we can perform on voice audio any operation that today is only possible with text.
There are a lot of places we can go from there :)

Maybe most importantly, we believe that: (1) Speech, the most essential form of human communication, is key to improving understanding, empathy, and accessibility of discourse on the Internet. And (2) solving for podcasts can help us solve more generally for the human voice. And that Matters. We feel incredibly fortunate to have the opportunity to work on this.

Thanks for reading! Follow our progress @meetgretta. And don’t hesitate to drop us a line anytime; we’re friendly. Looking forward to talking to you again soon.

Love,
Kim & the rest of the Gretta family: Kelly, Rainer, Jim, Cam, Sam, Safiyya, Richard, Ray, Greg, David(s), Javier, Justin, Jacques, Karj, Jenn, Clifford, and Daryl.

P.S. Hello to Jason Isaacs, David Morrissey, Stephen Fry, Fairport Convention, and Roman Mars #HTJI.

Gretta on making a social media interface for podcasts.

Written by Kim Hansen