Transcribe Your Podcast for Pennies

Alwin Aind
3 min readAug 15, 2023

--

Podcasts have been present for over 2 decades, and even though they have been very popular lately, there are still big market gaps which can be captured by anyone with a couple of microphones and a few hours of talking.

Podcasts are a rich source for learning, information, laughter and relaxation. This source becomes even richer through transcription.

Transcription is the process of converting audio into text. Transcription helps listeners to engage with your content more deeply. Especially when the podcast is on the intellectual topics, listeners prefer transcriptions to understand the content better. Transcriptions also makes your content accessible to deaf users.

Transcriptions not only benefits listeners but even the creators. Transcriptions can help you with SEO (Search Engine Optimization). Search engines would pick trending keywords from your transcription and drive traffic towards your podcast.

Coming to the methods of transcription, there are majorly 2 types:

  • Human Transcription
  • Machine Transcription

Both are pretty straight forward; you provide your audio files to the transcription service and they either get a Human to listen to the audio and convert it to text or they use computer algorithm for the same.

While human transcriptions are definitely more reliable and accurate than machine transcriptions have been becoming more and more accurate lately, so much so that they are comparable to human performance, thanks to the recent development in the field of AI.

We have already seen the power of ChatGPT how it can pass the toughest exams in the world, and also crack lame jokes at the same time. Similarly, STT (Speech to Text) AI models have been gaining human level performance. On top of that while human transcriptions on average have a turnaround time of 24 hours machine transcriptions services will only take a couple of minutes.

One of the most popular models for this is made by the same company which created ChatGPT, OpenAI’s Whisper. It is a transformer-based neural network model which converts speech to text in almost real time, and is as robust and accurate as a human according to OpenAI.

However complicated might the model be, its usage is simple enough for any lay person to regularly transcribe their podcasts.

OpenAI offers their API (Application Programming Interface) which in simpler terms is an online service which takes some data from you and processes it in some way and provides you back some data or a message. In case of Whisper, the data you provide is your audio file and the data the API provides is the text transcription.

There are other platforms which provide human transcription at the rate of $1.50 per minute and machine transcription for $0.25 per minute, this is the pay as you go model. And there are also a few subscription based models where you pay a monthly subscription fee and you have monthly transcription limit of a few hours.

Whisper does it for 40 times less, at only $0.006 per minute you can transcribe your 2 hour long podcast for only 72 cents.

So, what’s the catch? Seems too good to be true, right?

Well, there is obviously a catch, the other services provide you the ease of transcribing, you just , but with Whisper you have to get your hands a little dirty. But its not a steep learning curve and it is definitely worth the price you pay.

I will explain the all the details of how to use OpenAi’s Whisper model for transcription in my next post.

--

--