How to index your podcasts with language AI in Python

Enias Cailliau
Steamship

--

The other week I wrote about using Language AI to listen to 5500 hours of Joe Rogan’s podcasts. My code reads in MP3s, transcribes them, tags entities and sentiments, and then gives you a query endpoint to search the results.

I kept that post high level, so I wanted to follow up with a tutorial on how you can use the code I wrote on your own audio. The examples below are in Python, but it runs over an HTTP API, so you could do these from Javascript, R — Microsoft Excel if you’ve got that axe to grind… (reach out if you need help)

In 5 minutes, you’ll have analyzed your own audio

In this post, we’ll step through a minimal example where I’ll try to answer 2 questions from one of Joe Rogan’s podcasts:

  • What does Edward Snowden say about Trump, Obama, and Biden?
  • What makes Edward Snowden sad?

Let’s go!

Step 1. Connect to Steamship

We’ll be using Steamship to run our AI pipeline. So you’ll need to set up your Steamship credentials. You can log in and fetch your credentials using the CLI if it's your first time connecting to Steamship.

Step 2. Import my audio-analytics package

I wrapped all the code you need in a package called audio-analytics (link to repo). Here’s how you can spin up your own instance to run it in a private workspace.

A Steamship package instance auto-scales, so you only have to do this once for each workspace. You can browse through the package code here, but here’s the TLDR of what I do: I use distributed Steamship services called plugins to transcribe, analyze, and index audio files.

Step 3. Upload your audio

You can upload your mp3 files by pointing to a publicly accessible URL. Any publicly accessible URL will work, including pre-signed URLs to S3 or Google Storage. If you want to upload local audio files directly to your package, you can use the helper method upload_audio_file featured here.

Step 4. Transcribe and analyze your audio

Your instance will transcribe, analyze, and index your mp3 asynchronously as soon as you’ve uploaded it. You’ll have to wait for the analysis to finish to fetch the transcription and its language AI features.

Step 5. Getting the transcription and language AI features

Each file contains a transcript, and a list of language AI features called tags.

Once processed, you’ll receive a file that includes a transcript and tags containing 7 unique language AI features. All tags are anchored to the transcription, meaning you can navigate overlapping features using queries. For those interested, I created a more in-depth workshop that looks at the language AI features here.

Step 6. Query

All the audio files you upload to your instance will get added to your private workspace for future retrieval. That means we can use queries to retrieve relevant fragments and extract statistics across all your audio files in one go.

In this example, we’ll answer two questions:

What does Edward Snowden have to say about Trump vs Obama?

Snowden is quick to point out that both presidents have all made mistakes, war on whistleblowers (link), drone strikes (link), and other abuses (link). Meanwhile, Joe Rogan suggests Trump might pardon him (link). Snowden politely disagrees and suggests he might not accept a pardon from Trump if it happens.

What makes Edward Snowden sad?

We all know Snowden does not feel happy with the NSA, CIA, or FBI, but interestingly he also has some negative thoughts about Julian Assange and Wikileaks.

It’s up to you now

That’s it, really, a few lines of code. I scaled my instance and workspace to contain more than 50 of Joe Rogan’s podcasts and make them queryable. You can see the results here. I’m curious to see what cool apps YOU can make with this. Searching your Voice memos on your phone? Finding strange statements in CSPAN hearings? Finding every negative thing Sylvester Stallone ever said about Arnold Schwarzenegger?

Comment down below what you want to build, and we’ll make it happen 💪

--

--