Blogs‎ > ‎Tech Stuff‎ > ‎

2015.05.13 Using ffmpeg and Wavebooster to fix sound levels in video files

posted May 13, 2015, 10:01 PM by Troy Cheek   [ updated Nov 25, 2015, 6:50 PM ]
The problem with getting old, in addition to simply feeling old, is that my hearing isn't what it used to be.  Also, I swear that movies and television shows aren't sound mixed the way they used to be.  I have a homespun DVR that records all my television.  I hardly ever watch anything live anymore.  But when I sit down to watch TV for the evening, it seems like every program is different:
  1. Sound levels are perfectly fine.
  2. Sound levels are uniformly too quiet.
  3. whisper whisper GUNSHOT! whisper whisper EXPLOSION! whisper whisper CAR CRASH! whisper whisper COMMERCIAL!

Not only do I have to adjust the volume level for every show, but for some shows I have to ride the volume control the whole time.  #3 is especially annoying if I drift off to sleep in the middle of a show.  Sure enough, the closing credits theme music will be 10 times louder than the dialog and will wake me up.

#1 needs no solution.  #2 can be fixed by adding uniform gain or normalization or whatever you want to call it to the whole file.  #3 needs something called dynamic range compression, meaning it amplifies only the quiet parts and leaves the loud parts alone.  A good DRC utility would pretty much leave #1 alone, uniformly boost the volume of #2, and selectively boost the quiet parts of #3.  The problem is finding a good DRC utility.  And as my DVR computers is stuck with Windows Vista for hardware compatibility reasons, said utility had to work with that.

The best one I've found, results-wise, is without a doubt The Levelator.  Meant for use in podcasts and broadcast radio, this utility is designed to take sound files recorded by different people at different times with different equipment and massage them into something that can be edited together and played without giving the sound guy a stroke.  I've used it in some of my YouTube videos, and it works very well.  Unfortunately, it uses a drag and drop interface which can not be modified.  It works perfectly fine for the occasional fix, but it can't be automated to go to work on, say, every video file recorded by a DVR.

My second attempt was a program called SoX aka the Sound eXchange, the Swiss Army knife of audio manipulation.  "SoX is a cross-platform (Windows, Linux, MacOS X, etc.) command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files..."  SoX has a "compand" command which does DRC, and since it's a CLI (command line interface) program, it's pretty easy to automate.  Unfortunately, SoX is incredibly complicated powerful, and I never quite figured out the correct incantation parameters to properly DRC a sound file.  The example settings didn't seem to do much of anything, and my trying to change the settings just made things worse.

Finally, I found WaveBooster, "a tool that allows normalisation and dynamic compression of high resolution wave files."  It's a CLI, so it's fairly easy to automate.  It has a few options, and I won't claim to really understand what they do, but the suggested options did indeed greatly improve the audio files I tried on it.  It doesn't sound as good as The Levelator, and it doesn't have the raw power of SoX, but I could make it work and that's what counts.  It's a discontinued beta version from 2001 so it might be hard to find, so if you do download it, back it up somewhere.

For the following experiment you will need to download a current (circa Spring 2015) version of ffmpeg and WaveBooster.  (If you don't know which version of ffmpeg to download, try the 32 bit static build.  It has to be a fairly recent version because ffmpeg adds and changes new options all the time and even occasionally removes some!)  Create a scratch directory somewhere, preferably on your fastest drive with some excess capacity.  To this directory, from the ffmpeg archive copy ffmpeg.exe, and from the WaveBooster archive copy BoostCLI.exe and WavEngine.dll.  Also to this directory, copy (!!!) a few of your favorite video files which have too quiet or inconsistent audio.  I'll be using MPEG-2 source files (*.mpg) in my examples because that's what my DVR software uses.

The first step is to use ffmpeg to copy out the audio part of the source file into a separate file that we can work with.  The command for that will be something like this:

C:\> ffmpeg -i input.mpg -ac 2 output.wav

This tells ffmpeg to look in a file called input.mpg and copy out the first audio stream it finds into a file called output.wav, converting the audio to two channel (stereo) generic format WAV in the process.  If your source file has more than one audio stream, you're on your own.  ffmpeg can do it, but I'm fuzzy on the details.  If you have a fancy surround sound system and want to preserve your fancy 5.1 DTS audio, you're again on your own because WaveBooster only works with simple WAV files.  Use your media player of choice to check that this new file is indeed the audio from your video file.  I like VLC.

C:\> boostcli /i="output.wav" /o="boosted.wav" /a=-12 /m=-3

This tells WaveBooster to take the input file output.wav, apply some automatic dynamic range compression (minimum allowed) to it, and save the results as boosted.wav.  Use your media player of choice to compare the original and boosted audio files.

Now, this is where things get crazy.

C:\> ffmpeg -i input.mpg -i boosted.wav -map 0:v:0 -map 1:a:0 -ac 2 -vcodec copy boosted.mpg

This time, we're telling ffmpeg to look at two input files: our original video file and our boosted audio file.  The map commands tell ffmpeg to take the first video stream from input file 1 (or 0 since computer nerds start counting at 0) and the first audio stream from input file 2 (or 1 because, well, you know, reasons).  The "-ac 2" is probably not needed as we know the audio file is a plain stereo WAV, but I've got in the habit of using it ever since ffmpeg started throwing up every time I accidentally feed it 5.1 audio.  (I swear it didn't used to do that!)  The "-vcodec copy" command tells ffmpeg to just copy to original video to the new file, no conversion or decoding or anything like that.  This should give an output video with no loss of quality from the original.  Use your media player of choice to compare the original and boosted videos.

Now, of course, MPEG-2 is a pretty old and inefficient format.  As long as we're fiddling with video files anyway, why not update to h.264 mp4 while we're at it?

C:\> ffmpeg -i input.mpg -i boosted.wav -map 0:v:0 -map 1:a:0 -ac 2 boosted.mp4

And that's what I'm doing right now to some files my DVR is recording, just to test and see how well this works with the various type 1, 2, and 3 files I have laying around.  I'll let you know once I get around to watching them.  (Update:  It works pretty darn well.  I've added splitting out the audio, processing it with WaveBooster, and recombining the processed audio into the final video file to my standard DVR setup.)

"Why are you going to all the trouble to use a separate utility when ffmpeg has the same compand option as SoX and can do everything you need in one operation?  ffmpeg can do it for you!  ffmpeg can do anything!"

Yes, ffmpeg has the same compand option as SoX.  Since they're both open source projects, they probably use the same code.  The problem is twofold and just like SoX: first, using the suggested settings the results really aren't that different from the source, and second, the settings are so complex that I haven't a clue on how to adjust them.  Here's the examples I'm using:

C:\> sox asz.wav asz-car.wav compand 0.3,1 6:−70,−60,−20 −5 −90 0.2
C:\> ffmpeg -input.wav -af "compand=.3|.3:1|1:-90/-60|-60/-40|-40/-30|-20/-20:6:0:-90:0.2" output.wav

Now, while both SoX and ffmpeg create files with some improvement, they are nowhere as good as The Levelator or WaveBooster.  I believe this is because of about three reasons.  1) The Levelator and WaveBooster are much more aggressive in how they alter the sounds.  2) The Levelator and WaveBooster are two-pass automated systems, meaning they can scan the file as a whole before they go back and start intelligently processing it, therefore having a better idea of the highs and lows before they start changing anything.  SoX and ffmpeg are more stream processors who manipulate data as it goes by in exactly the manner you specify.  3) I have no idea what I'm doing most of the time.

If there are any SoX or ffmpeg experts out there who want to show me just how wrong I am, please feel free to email me.

Since it's pretty hard to locate WaveBooster, I've attached a copy at the bottom of this page.

Update on October 16, 2015:  Since I wrote all of the above, I have found a program from has released a program called Dynamic Audio Normalizer.  In addition to a stand alone GUI and CLI version, there's a custom version of SoX that includes, and it is now included as an audio filter in the standard version of FFmpeg.  I've only played with it a little so far, but so far, so good.  The dynaudnorm filter is much, much, much better than the compand filter.  I have not yet implemented this as part of my standard DVR setup, but intend to the next time I feel up to coding.

Update on November 25, 2015:  I tried to skip the whole "extract audio from the video, massage the audio, and mix them back together" process.  I just used the dynaudnorm filter.  The resulting video files didn't play properly on some of my devices.  I think something went wrong in the conversion of the 5.1 or other special audio into regular plain stereo.  Instead, I now use the same process I did with WaveBooster/BoostCLI, except I use ffmpeg -i output.wav -af dynaudnorm boosted.wav instead.  The boosted audio is not quite as loud as with WaveBooster but seems more natural and mixes back in with the video just fine.
Troy Cheek,
May 13, 2015, 10:01 PM