Use AI to Live Caption Your Meetings
Many people rely on captions every day. Unfortunately, not everything has captions; this often includes simple things, like that webinar you’re attending, or more advanced systems like Mozilla Hubs. Recently, I discovered Otter.ai, which offers AI-based live transcription. It’s not perfect, but it works about as well as most live TV broadcasts if you hook it up correctly. I’d still recommend pre-recorded videos should have human-corrected caption files, but not all live systems support captioning. Hopefully, this guide will help you hook up a bit of a stop-gap solution for those cases. Here we’ll look at captioning Mozilla Hubs in Firefox. First, you’ll need to do a loopback of your audio, and you’ll need a free Otter.ai account (you can use my referral code to get a month of free premium features too).
Audio loopback lets you take the audio from an application, say Firefox, and make it look like a new microphone attached to your computer. This method provides a clean signal instead of just turning up your speakers and hoping your microphone acoustically gets the sound, which creates a lot of noise that tends to make AI automated speech recognition services fail fast.
You’ll need software to make a proper loopback. If you’re on a PC, I believe you can use LoopBeAudio. If you’re on a Unix flavor, run vi and figure it out. I’m on a Mac, so I’ll be using Loopback. You can use it for 20 minutes for free, and then it degrades your audio by adding noise. The $100 license isn’t cheap, but it works very well. If you know any better tools for Mac, PC, or Unix, let me know. When you run Loopback, you’ll see a little flow chart for the audio.
You can give it a friendly name to say what it is like, “Firefox and Microphone.” Then add a source Firefox and a microphone source or whatever else you want to ‘listen’ to.
You’ll see them pop in the flow chart.
It’s essential to turn off Mute when capturing if you want sound on the audio feed to come out the speakers. Make sure it is on in Loopback and voila!
Now head over to Otter.ai and sign in. You can click Record, and it will prompt you for a microphone. Select your Loopback microphone from the list.
And you’re done! Anything from Firefox goes to Otter.ai.
Here, I’m playing an embedded video in Mozilla Hubs in one tab. The other tab is Otter.ai listening to Firefox and transcribing! It’s not perfect, but if you have a system with no captioning support like Mozilla Hubs, Jitsi, or even Zoom, this is an excellent way to capture and caption that content. Of course, it records all the audio, so inform people you’ll be doing this in advance.
Otter.ai gives you an interface for correcting things, adding vocabulary to help it know some of your words (though acronyms like UBICOMP and UIST still confuse it), and gives you a fun keyword summary when completed. You can then export the notes as text or even an SRT for subtitles if you need them. I updated my Remote Video Presentation Guide to include more specifics if you need a timestamp caption file.
We can assume AI-based captioning will get better over time, though technical content is a challenge. Unfortunately, many commercial or experimental systems don’t have the proper captioning or even hooks for captions (like Otter.ai doesn’t currently have an API else I would have written it into Mozilla Hubs). Hopefully, this guide will help bridge that gap just a bit!