Raising the sound and the standards bars

Vittorio Giovara
Vimeo Engineering Blog
5 min readDec 21, 2020

Some of the most important innovations and technical improvements that push humanity forward often take place backstage, sometimes going completely unnoticed, slowly included into normal expectations. I want to fix that and showcase the immense importance of the new codec just released at Vimeo, Opus, and make sure that the story behind an audio codec does not go unjustly…unheard of.

Opus isn’t a simple audio codec. It’s not even a single codec, but a combination of two technologies that perform better at different tasks, specifically CELT and SILK. These used to be their own separate codecs, developed by the Xiph.Org Foundation and Skype, respectively, as a natural evolution of their previous codecs, Vorbis/Speex on one side and SVOPC on the other.

They were merged into a single codec at called Opus and standardized at the IETF in 2012 as RFC 6716, with the requirements that audio must perform well at high fidelity compression-wise and in low-delay applications, with voice and telephony sounds being “as good as your stereo system.” The result was a hybrid solution in which different compression modes can be applied on a frame-by-frame level. I would do poor justice in describing all the technical qualities of this codec (and go wildly off topic), so I’ll just refer you to the Wikipedia article.

One interesting fact is that internally the codec always converts the sample rate to a fixed value and tweaks the bandwidth to the best matching one. This was done to simplify the overall decoding process, but it certainly led to a few implementation gotchas when the decoder was implemented in FFmpeg, the open-source multimedia toolkit. In the FFmpeg framework, each encoder or decoder resides in a library called libavcodec, which didn’t have any resampling feature, so a new linking system had to be designed in order to use the libswresample library instead.

Beside the per-frame hybrid decision mode, the other groundbreaking innovation was that this codec is completely free: the reference encoder and decoder are of course open source, but the standard itself is, too. All the known patents that are actually present in the specification are released as royalty free, and there’s no complicated licensing scheme, which usually plagues other competitive codecs, except that you immediately and retroactively lose the license to use the codec if you attempt any Opus-related patent litigation, as described here. This was a nice deterrent that was later used (with a few modifications) in other codecs, such as AV1.

Offering great performance at low delay and being patent unencumbered, Opus was the best possible candidate for WebRTC, the real-time communication system standardized by IETF shortly after. WebRTC allows for simple interoperable audio communication over the Internet, initially designed for browser-to-browser connectivity, but it quickly became so popular that several devices and conference applications started adopting it, too.

But what about non-live on-demand video? Everything looked great on paper for this codec, but there was one major showstopper: the default encapsulation format was met with opposition from most implementers. When the codec was completed, the formats of choice were OGG and Matroska, but those aren’t really streaming formats. Media on the web is consumed with either WebM or MP4 (and at the time MPEG-TS, too), but the amount of tooling and support for MP4 simply trumps the alternatives. Opus needed a specification to define how it was going to be packaged in MP4, and the open source community stepped in.

The first step was to request a code point at the MPEG-4 Registration Authority in order to have a universal code to recognize Opus streams. I wish it was as adventurous as the MPEG-TS registration at SMPTE, in which the applicant had to physically fax the application, twice, but no Victorian-era technology was needed this time around, and we got a nice Opus code to tag our files with.

Then we needed a way to encapsulate the streams (how to actually package the bytes). This was done by the L-SMASH project, with feedback from Apple — in case you don’t know, L-SMASH is simply the best ISOBMFF packager and analyzer — the specification document was submitted to the appropriate MPEG office and referenced over official emails, and it’s now the de-facto standard used in all implementations. Having this codec encapsulated in MP4 means that anybody may deploy Opus without too many changes to their existing pipeline.

Most browsers implemented the specifications right away (Chrome, Firefox, and Edge), in addition to their WebM support, while some are still in the works (Safari); see caniuse.com for more information. In either case, we could finally encode Opus at Vimeo, store it as MP4, and use it in our on-the-fly media packager. Supporting this format brought several good tweaks in our software, too, and you can check that you’re receiving Opus from our debug panel (press the D key on any video).

https://vimeo.com/454808914

We’re doing a slow rollout for this codec, and while every single new upload to Vimeo (after December 17th) will receive it, in order to actually listen to it on a supported browser, you need to enable it in Player flags, under the beta panel. For now, each viewer needs to set this preference, not the creator, but we’ll flip the switch to default this to enabled pretty quickly, as soon as we’re confident there are no major issues.

We’ve heard plenty that media nowadays lives in a Multi Codec World, and there are competing efforts to develop the best next-generation video codec. However, audio is also an essential part of the experience — an audio glitch is usually more noticeable than a video artifact — and it was about time for Vimeo to deliver the best available solution to our viewers. Opus improves upon previous royalty-free codecs and combines general and speech audio into a single royalty-free hybrid codec, and it simply is the best in class in terms of performance, availability, and distribution.

I would like to acknowledge the immense work that teams outside Vimeo did, in particular the Xiph.Org Foundation, the FFmpeg volunteers, and the L-SMASH authors, and inside Vimeo, especially the Player team, Staff Engineering, and the Video QA team, for their contributions to make this release possible.

Interested in flexing your engineering chops at Vimeo? Join our team!

--

--