Google VS AWS — speech to text
I’m launching a podcast (want to be on it?) and was stuck almost immediately after the interview with the task of transcribing. Being a developer the idea of paying $75 to transcribe a podcast was and is untenable, so I looked to some of the big players to see how well their transcriptions services actually work.
The answer, it turns out, is not so well.
The audio file used was a one minute segment from an upcoming episode of Product (a product-focused podcast).
AWS
What was nice as an initial test was AWS interface to simply upload the audio (mp3) to s3 and create a transcription job. It took about 5 minutes for a 45-minute audio file.
Sample output:
The first thing to notice is the transcript is garbage. Just terrible. Totally unusable.
The second thing to notice is the speaker labels. Without anything else to go off of besides a time stamp, there’s no way to correlate times in the transcript with who is speaking when, so A+ for speaker identification but it’s useless if I can’t easily split the transcript up by speaker
… next!
Google’s transcription service was a little more of a pain to use. You have to use a .wav file (had to find that out by a search after an obscure error message). So I converted a 29.7MB mp3 file into a 237.4MB wav file and then it worked 🙄.
Conclusion
For all of the machine learning drip that Google and Amazon have, their transcription services are still not good enough to throw a podcast at and get sensible output with multiple speakers identified and text segmented by speaker cleanly. The Google output was more accurate compared to AWS but the reality is I can’t use either of these as a final transcription without some serious cleanup, in which case I mind as well pay the $75 for a great transcription.
It’s really mind-blowing to think of how high we hold these companies in our heads (or at least I do) but when it comes down to practicality for a specific use case, it looks like there is still some white space going vertical (in this case podcast transcriptions) going straight up against some of the biggest names in the world.
Liked the article? Sign up for our newsletter. This article was brought to you by SugarKubes. Sugarkubes is a container marketplace. Want to start running AI at the edge? Need some sweet machine learning models that work out of the box? Check us out at https://sugarkubes.io.