Dynamic Range Compression for audio with ffmpeg and compand

Jud Dagnall
9 min readSep 4, 2018

--

How I used ffmpeg and audacity to fix very loud and very quiet parts in an audio recording.

I listen to a lot of audiobooks and talks in the car on my commute. Recently, I found a great talk that was very hard to listen to. The speaker dramatically varied his voice level for emphasis, causing his overall volume to jump around as he went from quiet to an occasional shout. In the car, I had to turn the volume up a lot to hear the normal/quieter parts, and even then, I could barely hear some of the questions from the audience. And then the shouting was ear-numbingly loud!

I wanted to adjust the volume so that the quiet parts were a lot louder, and the loudest parts were a lot quieter. I know almost nothing about audio processing, and so I went to the internet to figure out a bit more. Initially, I though this was called “normalizing” the volume, and tried a few techniques. However, I learned that what I actually wanted was called Dynamic Range Compression. Normalization is just fixing the loudest part so it’s reasonably consistent: It doesn’t impact the quiet parts significantly.

I’ve used an open source command-line tool called ffmpeg for various video processing tasks, and I knew it could do some audio. Because the audio had been split up into around twenty separate files, I wanted to use a command line tool to process them all in a batch with one command.

I discovered that ffmpeg has an audio filter called compand that appeared to do what I wanted. However, the interface was very cryptic, and I didn’t understand the documentation. Well, I understood (nearly) all of the words, but I had no idea how to translate that into what I wanted.

I discovered a GUI tool called audacity (also open source) that enabled me to visualize the audio track in a waveform as well as do some experiments, and combine a couple sections of the track into my sample. The waveform allowed me to look quickly at the results of the changes I made, without playing through the entire sample.

I ended up creating a reference sample as I tried various things. It was about a minute long, and allowed me to experiment without jumping around.

Here’s what my sample track looked like. It has normal speaking at the beginning, a really quiet section, and then some shouting before going back to a normal voice. I’ve used audacity to generate these waveform visualizations, although I also discovered ffmpeg can generate waveform images as well, which you’ll see later. This image has a left and right track, and the overall volume is how far from the center the blue lines go for each track. I’ve marked several sections with red boxes and annotations to make it clear what normal, loud and quiet looks like.

Waveform of original track with extreme loud and quiet parts annotated

I started focusing the “points” part of the compand command, which is a way of translating or mapping one range of volume levels to another. The first problem I ran into (and the one that eventually led me to Audacity), was that I didn’t know what volume level (in decibels) the various parts of the track were. However, Audacity has a nice playback volume meter, and I could get a pretty reasonable idea of how loud each section was by watching where the volume hovered or peaked for each section:

Audacity volume meter

So I saw that the really quiet parts of the audio were around -40 to -20 dB.

To test this, and my understanding of compand, I added a very simple filter to remove the quiet section of the audio by decreasing their volume dramatically:

ffmpeg -i in.mp3  -filter_complex \
"compand=attacks=0:points=-30/-900|-20/-20|0/0|20/20" \
out.wav

ffmpeg takes an input file (in.mp3 in this case), does an audio filter called “compand” with some specific parameters (described later), and then writes it to the “out.wav” file. Note that I used .wav files for my experiments, and eventually changed the final output format to mp3.

From here on out, I’ll only show the parameters for compand portion, as the rest of the command should be the same.

attacks=0 means that I wanted to measure absolute volume, not averaging the sound over a short (or long period of time). When the speaker suddenly yells, or talks back and forth with someone off mic, I want the volume adjustment to be immediate. The downside is that you may hear the volume being clamped.

points is the actual volume mapping function, and I’ll walk through it:

I added a mapping of -30/-900, which means that volume below -30db in the original input track gets converted to -900db (completely silent).

However, at -20db, I want it to back to normal (volume completely unchanged), so I put a -20/-20 mapping in. That means that between -30 and -20, the output volume goes from -900 to -20, meaning that everything below about -21 dB will be completely quiet. The 0/0 and 20/20 were just a few additional anchors to indicate that I didn’t want any other part of the volume changed. I ran my sample audio through this “quiet filter”, and saw that I’d gotten nearly all of the quiet stuff, the section near the middle is very empty:

Here’s a graph that illustrates the translation between input and output volume:

Volume mapping

So now that I knew that the quiet sections were at or below -20, I wanted to make them a lot louder. So I added a mapping that would boost the quiet sections, not remove them, and still keep the loud parts mostly the same:

compand=attacks=0:points=-80/-900|-45/-15|-27/-9|-5/-5|20/20

The volume mapping was a follows:

  • - 80/-900: Remove the really quiet stuff.
  • -45/-15: Make the quietest part of the audience questions pretty clear(a 3x increase). You will likely need to fiddle with this if there’s a lot of audience noise, chairs moving, etc… that you don’t want to hear.
  • -27/-9: make the medium part of the questions easy to hear.
  • -5/-5: Keep the normal to loud voice unchanged (for now)
  • 20/20: Just an extra anchor point to keep the loud stuff loud (for now)
Waveform after boosting the quiet portions 3x

And a visualization of the mapping. Note that any input below about -45 drops off very rapidly, and it’s a lot of compression between -45 and -9.

Make the quiet parts louder, and remove the very, very quiet

So now I could clearly hear all parts of the audio, but there was still the yelling parts that I needed to bring down. So after a bit more experimentation, I found that -7dB was about the loudest of the normal spoken voice parts. So I put another point there to make sure nothing got louder than that:

compand=attacks=0:points=-80/-900|-45/-15|-27/-9|0/-7|20/-7
  • 0/-7: take what should be about the loudest part, and decrease it all the way down to -7. That means all the normal and loud stuff ends up between -9 and -7. The shouting still seems loud if it goes on for very long, because it’s just a lot more sound at the top volume. But it’s not jarring any more.
  • 20/-7: Add another anchor just to be sure nothing ends up louder that -7.
Increase the quiet sections, limit the loudest parts

This was pretty much the way I wanted it. The questions were still a bit quieter than the main speaker, but were almost easy to hear. There’s not a lot of range in the output, which means that on the highway, I can turn it up and hear just about everything without having my eardrums blow out when he shouts!

However, the overall volume was now pretty low, max of -7 instead of the target of about 0 which is what I read is typically the target for digital audio. So I added one more parameter to compand to bring the whole volume up by 5 (called adjusting the gain). Any more than that and it still seemed loud to me:

compand=attacks=0:points=-80/-900|-45/-15|-27/-9|0/-7|20/-7:gain=5

The same mapping points, the same attack setting, but now everything was just louder. It’s not quite hitting the top of the bars, but sounded the way I liked it.

Here’s the final before and after, with the full command again:

ffmpeg -i in.mp3 -filter_complex \
"compand=attacks=0:points=-80/-900|-45/-15|-27/-9|0/-7|20/-7:gain=5" out.mp3

And here’s a before and after of the waveforms:

before and after waveform with boost, compression and gain adjustment

This ended up being much easier to listen to.

Along the way, I discovered a few additional fun things with both Audacity and ffmpeg:

  1. Audacity has its own tool for doing dynamic range compression. I didn’t end up using it because I wanted to script the solution so I could run it on multiple files, and also because I couldn’t quickly figure out how to do the very specific mappings that I wanted.
  2. ffmpeg can generate wave form images too. I couldn’t get them to look as good as Audacity, but they’re great for quick and dirty (visual) feedback. Here’s the waveform from the original sample, with both channels combined into a single channel (mono).
ffmpeg -i in.mp3 -filter_complex ""aformat=channel_layouts=mono,showwavespic=s=1000x200"" -frames:v 1 in.png
waveform generated with ffpmeg

A Compand cheat sheet and explanation

Here’s my cheat sheet for the compand commands. Now that I’ve experimented with some of these settings, the docs make more sense, but something like this would have been helpful!

First, some essential reading.

Compand takes the options listed below. For conciseness, you can just jam them together in order, separating each section with “:”. Here’s the original example I listed from the docs

compand=.3|.3:1|1:-90/-60|-60/-40|-40/-30|-20/-20:6:0:-90:0.2

Unfortunately this was quite confusing at first. It really means:

  • attacks=.3|.3 (left and right channels. The amount of time in seconds to average when detecting volume spikes. You can instead have a single value that will apply to all channels, e.g. attacks=.3), which is what I did in my examples. 0 means no average, just use the current value, which can be choppy in some cases, but also will clamp short volume spikes, which is exactly what I wanted.
  • decays=1|1 (Same as attacks above, except for for detecting volume drops)
  • points=-90/-60|-60/-40|-40/-30|-20/-20 (the mapping function, input dB/outputdB)
  • soft-knees=6 (a softening function for the changes)
  • gain=0 (increase or decrease the overall volume of the output)
  • volume=-90 (?)
  • delay=0.2 (?)

You can specify only a subset of the parameters, and use the parameter labels, like attack=XXX, release=XXX, points=XXX, still joining each section with “:”

compand=attacks=0:points=-80/-900|-45/-15:gain=5

At the end of this process, I was able to write a simple loop to process all twenty files, and the listening experience was much better in my car. ffmpeg and audacity are now a part of my audiobook toolbox.

Play around with these techniques yourself, and see what works. Happy listening!

--

--