How to edit audio files without using an audio editor
I paraphrased the title of this album of Sonic Youth to explain my last experiment : create and edit audio from code and, if possible, automate the whole creation process. The goal of this experiment was not to produce an actual entire hour-long podcast (yet) but to find out if and how to do it.
TL;DR & Final Result
I used Sonic Pi and some other tools to edit audio from code and command line only.
From script to result
Let’s say you have some cool voice records, an interview for example, or a speech of Greta Thunberg at the United Nations and you want to create a cool podcast out of it. You might script something like :
- I would like to hear her first sentence, with a cool ambient sound in the backgound, let’s say, some rainy atmosphere
- “Oh yes”, your friends would say, “then you should pause her speech for a few seconds, and add some more ambient sound, and make it sound darker”
- Then you could play some other parts of her speech, and add music to it at a specific moment for drama
And this would be the script of your short podcast (for the sake of this article, I’ll stick to it)
Turning it into a podcast would normally imply to download the speech from a youtube-to-mp3 service, a multi-track audio editor, finding some cool sounds, and start to manually realize your script, turning it into to a real thing by cut-organizing sounds on the timeline of the audio editor, add effects, manage volume and balance, etc.
But what if we could use a computer script to turn your litteral script into a podcast ? Code is litterally a scripted text to turn data into stuff, right ?
You might, or might not, have already heard of Sonic Pi. It’s a small software/coding environment (based on Ruby language) allowing people to create music from code, that is : turning written instructions into sound.
It might sound complicated but it’s actually pretty simple :
- You tell the computer to play a sound from a file
- Then another one with a reverb effect
- Then synth notes in a given order
- Then loop back to the start
And it came to my mind that we could actually use it to script a podcast rather than editing it in a dedicated software …
Tools & Process
To script our podcast, we need to :
- Get the recording (with
- Convert it to mp3 (with
- Split it automatically in different parts based on silent moments (with
- Get some ambient sounds (on the great sound back Freesound)
- Translate our “human script” into a “sonic script” (with Sonic Pi)
Get the recording from Youtube
If you don’t have the interview records yourself and rely on a Youtube video, you can either use a Youtube-to-mp3 converter website or, if you feel geeky (which is required for the next parts of this article), you can use Youtube-dl, a great command line tool to easily fetch audio from a Youtube link.
youtube-dl -o greta.mp3 https://youtube.com/watch?v=gEanyDxn0UY
This will download the sound of the video in the greta.mp3 file. You’ll need to convert it from mp3 to wav format, which can be done easily with Mpg321 :
mpg321 -w greta.wav greta.mp3
Great ! That was the easy part.
Split the file on silent moments
Now you want to split this record, and things get tricky. You could do it manually with an audio editor, but that’s not what we want : we want to Script and Automate, which is our motto.
After some duckduck-going (which is the new googling, all the cool kids are doing it) I stumbled upon SoX (Sound eXchange), which is a command line tool defined as the Swiss Army knife of sound processing programs.
SoX has a param called
silence which allows us to trim silent parts of a file. It’s a bit tricky to use, but this article helped me a lot and I recommend you to read it if you want to go deeper. Here is how I used it :
sox greta.wav greta-split.wav silence 1 0.0 0% 1 1.0 0.3% : newfile : restart
As I understood it, this translates roughly to :
1 0.0 0%: ignore the silence at the beginning of the audio sample. It can be tuned to something like “trim until you go above X.X seconds above Y% of max volume” but I wanted to keep it
1 1.0 0.3%: cut the file if the sound goes below 0.3% of max volume for more than 1.0 second
: newfilecreate a new file with the given template (
: restartdo it over and over again, until you processed the whole file
This will generate a batch of files :
greta-split001.wav, greta-split002.wav, ... greta-split000.wav
Great ! Now let’s create something !
Edit audio like a hacker with Sonic Pi
As I said earlier, Sonic Pi is a coding environment dedicated to music creation, by letting you use instructions like “play this note, then play this sound with this effect, loop to the start”, etc.
But we will use it to generate a simple linear audio file. And to do so, we will translate our podcast script into a code script using only a few instructions :
- load samples in a variable, either individually or a full directory at once
- play this sounds with
- wait a few seconds with
- add some effects with
Each of these instructions can use some parameters to specify its action :
sample greta, 1will play the second sample of the directory from which we loaded the splitted samples of the speech
sample rain, amp: 0.2will play the
rainsample on a low volume with the
:ampparam (to be set between 0.0 and 1.0)
sample ambiance, amp: 0.2, pan: -0.5will play the ambient sound on a low volume AND a bit on the left ear with the
:panparam (to be set between -1: full left, and 1: full right)
with_fx :reverb dowill apply a reverb effect on following instructions until it encounters a
Here is the full code :
# 0 - Load the audio samples
greta = "D:/Code/bin/SonicPiPortable/Work/Samples/greta/split/"
rain = "D:/Code/bin/SonicPiPortable/Work/Samples/rain.wav"
cello = "D:/Code/bin/SonicPiPortable/Work/Samples/violon.wav"
ambiance = "D:/Code/bin/SonicPiPortable/Work/Samples/ambiance.wav"# 1 - Play the first part of the speech
sample greta, 1# 2 - Play the background sound of rain with a low volume (amp param)
sample rain, amp: 0.2# 3 - Wait a few seconds, then play the second part of the speech
with_fx :lpf, cutoff:80 do
sample greta, 2, amp: 1
end# 4 - Play the ambient sound at a low volume, a bit on the left ear
sample ambiance, amp: 0.2, pan: -0.5# 5 - Wait-and-play next speech part
sample greta, 3# 6 - Wait for 2secs before playing the cello on the right
with_fx :reverb do
sample cello, amp:0.2, pan: 0.5
end# 7 - Finally, play the last speech part (for now)
sample greta, 4
This process may seem complicated at first but once all the tools are installed, it becomes clear that :
- You can edit audio without a proper audio editor …
- … and without even touching your mouse !
- You can automate the download and splitting of the files
- You can create a sonic pi template file, and loop through the splitted files and automate the edit
- Find a great text-to-speech tool to automatically read articles from a RSS feed and automatically create podcasts from newspaper articles !
- And since Sonic Pi runs a server, you can even script the whole process and get the .wav record and publish it automatically !
That’s a bit ambitious but this is technically possible.
I’m not sure text-to-speech is great enough yet to be actually pleasant to listen to, but I’m pretty confident this will come in a near future.
Meanwhile, I think this process can actually be used to automatically edit interviews or monologues, more easily and quickly than with a full editor.