Experimental Podcasting, Code and No Speakers

How to edit audio files without using an audio editor

I paraphrased the title of this album of Sonic Youth to explain my last experiment : create and edit audio from code and, if possible, automate the whole creation process. The goal of this experiment was not to produce an actual entire hour-long podcast (yet) but to find out if and how to do it.

Original Photo : Roman Pohorecki

TL;DR & Final Result

I used Sonic Pi and some other tools to edit audio from code and command line only.

Final result on Soundcloud

From script to result

Let’s say you have some cool voice records, an interview for example, or a speech of Greta Thunberg at the United Nations and you want to create a cool podcast out of it. You might script something like :

  • I would like to hear her first sentence, with a cool ambient sound in the backgound, let’s say, some rainy atmosphere
  • “Oh yes”, your friends would say, “then you should pause her speech for a few seconds, and add some more ambient sound, and make it sound darker”
  • Then you could play some other parts of her speech, and add music to it at a specific moment for drama

And this would be the script of your short podcast (for the sake of this article, I’ll stick to it)

Turning it into a podcast would normally imply to download the speech from a youtube-to-mp3 service, a multi-track audio editor, finding some cool sounds, and start to manually realize your script, turning it into to a real thing by cut-organizing sounds on the timeline of the audio editor, add effects, manage volume and balance, etc.

But what if we could use a computer script to turn your litteral script into a podcast ? Code is litterally a scripted text to turn data into stuff, right ?

You might, or might not, have already heard of Sonic Pi. It’s a small software/coding environment (based on Ruby language) allowing people to create music from code, that is : turning written instructions into sound.

It might sound complicated but it’s actually pretty simple :

  • You tell the computer to play a sound from a file
  • Then another one with a reverb effect
  • Then synth notes in a given order
  • Then loop back to the start

And it came to my mind that we could actually use it to script a podcast rather than editing it in a dedicated software …

Tools & Process

To script our podcast, we need to :

  • Get the recording (with youtube-dl )
  • Convert it to mp3 (with mpg321 )
  • Split it automatically in different parts based on silent moments (with sox )
  • Get some ambient sounds (on the great sound back Freesound)
  • Translate our “human script” into a “sonic script” (with Sonic Pi)

Get the recording from Youtube

If you don’t have the interview records yourself and rely on a Youtube video, you can either use a Youtube-to-mp3 converter website or, if you feel geeky (which is required for the next parts of this article), you can use Youtube-dl, a great command line tool to easily fetch audio from a Youtube link.

youtube-dl -o greta.mp3 https://youtube.com/watch?v=gEanyDxn0UY

This will download the sound of the video in the greta.mp3 file. You’ll need to convert it from mp3 to wav format, which can be done easily with Mpg321 :

mpg321 -w greta.wav greta.mp3

Great ! That was the easy part.

Split the file on silent moments

Now you want to split this record, and things get tricky. You could do it manually with an audio editor, but that’s not what we want : we want to Script and Automate, which is our motto.

No audio editor allowed

After some duckduck-going (which is the new googling, all the cool kids are doing it) I stumbled upon SoX (Sound eXchange), which is a command line tool defined as the Swiss Army knife of sound processing programs.

SoX has a param called silence which allows us to trim silent parts of a file. It’s a bit tricky to use, but this article helped me a lot and I recommend you to read it if you want to go deeper. Here is how I used it :

sox greta.wav greta-split.wav silence 1 0.0 0% 1 1.0 0.3% : newfile : restart

As I understood it, this translates roughly to :

  • 1 0.0 0% : ignore the silence at the beginning of the audio sample. It can be tuned to something like “trim until you go above X.X seconds above Y% of max volume” but I wanted to keep it
  • 1 1.0 0.3% : cut the file if the sound goes below 0.3% of max volume for more than 1.0 second
  • : newfile create a new file with the given template ( greta-split.wav )
  • : restart do it over and over again, until you processed the whole file

This will generate a batch of files : greta-split001.wav, greta-split002.wav, ... greta-split000.wav

Great ! Now let’s create something !

Edit audio like a hacker with Sonic Pi

As I said earlier, Sonic Pi is a coding environment dedicated to music creation, by letting you use instructions like “play this note, then play this sound with this effect, loop to the start”, etc.

But we will use it to generate a simple linear audio file. And to do so, we will translate our podcast script into a code script using only a few instructions :

  • load samples in a variable, either individually or a full directory at once
  • play this sounds with sample
  • wait a few seconds with sleep
  • add some effects with with_fx

Each of these instructions can use some parameters to specify its action :

  • sample greta, 1 will play the second sample of the directory from which we loaded the splitted samples of the speech
  • sample rain, amp: 0.2 will play the rain sample on a low volume with the :amp param (to be set between 0.0 and 1.0)
  • sample ambiance, amp: 0.2, pan: -0.5 will play the ambient sound on a low volume AND a bit on the left ear with the :pan param (to be set between -1: full left, and 1: full right)
  • with_fx :reverb do will apply a reverb effect on following instructions until it encounters a end keyword

Here is the full code :

# 0 - Load the audio samples
greta = "D:/Code/bin/SonicPiPortable/Work/Samples/greta/split/"
rain = "D:/Code/bin/SonicPiPortable/Work/Samples/rain.wav"
cello = "D:/Code/bin/SonicPiPortable/Work/Samples/violon.wav"
ambiance = "D:/Code/bin/SonicPiPortable/Work/Samples/ambiance.wav"
# 1 - Play the first part of the speech
sample greta, 1
# 2 - Play the background sound of rain with a low volume (amp param)
sample rain, amp: 0.2
# 3 - Wait a few seconds, then play the second part of the speech
sleep 6
with_fx :lpf, cutoff:80 do
sample greta, 2, amp: 1
# 4 - Play the ambient sound at a low volume, a bit on the left ear
sample ambiance, amp: 0.2, pan: -0.5
# 5 - Wait-and-play next speech part
sleep 6
sample greta, 3
# 6 - Wait for 2secs before playing the cello on the right
sleep 2
with_fx :reverb do
sample cello, amp:0.2, pan: 0.5
# 7 - Finally, play the last speech part (for now)
sleep 8
sample greta, 4


This process may seem complicated at first but once all the tools are installed, it becomes clear that :

  • You can edit audio without a proper audio editor …
  • … and without even touching your mouse !
  • You can automate the download and splitting of the files
  • You can create a sonic pi template file, and loop through the splitted files and automate the edit
  • Find a great text-to-speech tool to automatically read articles from a RSS feed and automatically create podcasts from newspaper articles !
  • And since Sonic Pi runs a server, you can even script the whole process and get the .wav record and publish it automatically !

That’s a bit ambitious but this is technically possible.

I’m not sure text-to-speech is great enough yet to be actually pleasant to listen to, but I’m pretty confident this will come in a near future.

Meanwhile, I think this process can actually be used to automatically edit interviews or monologues, more easily and quickly than with a full editor.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store