How to transcribe phone / video calls with perfect speaker labeling

Using Descript’s new Multitrack Transcription to transcribe your recording with unprecedented accuracy.

Multitrack Transcription, introduced in Descript 1.3, is a powerful feature that lets you combine multiple audio recordings into a single interactive transcript. Along with improved audio quality, Multitrack opens the door to near-perfect speaker labeling — which is good news for anyone who works with transcripts.

The secret to using Multitrack is to capture your recording with multiple microphones, each with its own audio track. This is common practice in professional studio environments, where multiple microphones are hooked into the same computer.

Most of us don’t have a professional recording setup — but our phone and video calls get us most of the way there, since everyone’s already speaking into their own microphone. The trick is to record these mics so that each participant is captured to their own audio track. This used to be a logistical headache, but today a handful of modern communication apps make it easy.

Here’s how to use apps like Zoom and Skype to capture multitrack recordings — and get perfect transcription speaker labeling — with just a few clicks.

Zoom is a popular app for making voice and video calls. 1-on-1 calls are always free, and larger group calls are free for up to 40 minutes (Zoom offers reasonable pricing tiers for users who make more group calls).

Here’s how to get started:

  1. Sign up for a Zoom account.
  2. Download the Mac App. Install it and log in.
  3. Go to > Preferences > Recording and check “Record a separate audio file for each participant that speaks”.

How to record a phone or video call with Zoom

  1. Start a Zoom meeting on your computer (you can start with or without video).
  2. Invite your meeting participants, who can join from:
  • Their computers, using the Zoom app. The app lets participants select whether they’d like to share video, or just audio.
  • Their telephones, using these instructions to dial-in.
  • Their iOS and Android devices using the Zoom app.

3. Click the “Record” button to start recording. Be sure to wait until all of your meeting’s participants have joined before you begin recording (otherwise you’ll have to manually tweak the resulting audio files so that they all start from the same time, which you can do with GarageBand).

4. Conduct your call as usual. When you end the meeting, Zoom will automatically export your audio files.

(For more information on getting up and running with Zoom, see their official documentation)

Getting a transcript with speaker labels

  1. Once you end your Zoom meeting, locate the folder where the files were saved. You can choose which folder to save to in Zoom’s Recording options, pictured earlier.
  2. Open the “Audio Record” subfolder to find separate audio files for each speaker. (Zoom will also generate a ‘combined’ audio recording; you want the files under “Audio Record”.)
The highlighted tracks are separated for each speaker

3. Follow the instructions to create a Multitrack Transcription.


Skype doesn’t offer native call recording, but third-party developers have built plugins that accomplish this. (It’s a little involved, but some people get better audio quality using Skype versus other conferencing services).

You can find Skype’s official list of plugins here; for the purposes of this guide we’ve chosen one of the most popular: eCamm Call Recorder for Skype. It’s free for 7 days and costs $40 beyond that. (We haven’t tested all of the available Skype plugins; note that some do not offer multitrack recording).

Ray Ortega has published an excellent tutorial on getting started with Call Recorder for Skype. By following the steps in this short video, you’ll wind up with separate audio tracks for each speaker (this technique works for calls with two participants).

Follow this tutorial to get multitrack Skype recordings

Once you have your individual audio tracks, follow the instructions to create a Multitrack Transcription.

A satisfied multitrack transcriber