Encore Open Source — Licensed to once again Transcode

Published in

The SVT Tech Blog

15 min readMay 5, 2021

An update on how, and why, we built our own Transcoding solution, the struggles that we had, the positive results we finally got, how it basically operates and why we are now releasing it as Open Source.

“Encore — Transcoding at its core” by Olof Lindman / CC-BY-ND 4.0

To be able to work on a truly meaningful project where the potential seems limitless, whilst still keeping the main idea crystal clear, where all the needed tools are at your disposal and you are able to effectively discover your way forward, is a remarkable privilege often earned at the conclusion of brave ideas and hard work. It is the same feeling that you get when an endeavour strikes a seemingly perfect balance between innovative experimentation and practical results. Because even though the vast majority of the surrounding solution space remains unknown at any given time, every step of the way somehow feels familiar.

Like a blast from an epic past we finally find ourselves back here! It has been roughly three years since I wrote an enthusiastic blog post about SVT Encore, our in-house transcoder solution for VoD streaming (link). Back then, the solution we presented was merely a PoC (Proof of Concept) but it did prove, without a doubt, that Encore could one day replace the proprietary commercial solution we were using at the time.

So we kept the name, some of the inherent design, transcoding profiles and clever media handling but threw out most of the rest. This time we put a larger emphasis on stability, flexibility, scalability and control. Just about one and a half years after its initial inception the new version of Encore was working at full speed.

Today, almost all VoD content that is being released on SVT Play has been transcoded using Encore. I say “almost all” because live content is still transcoded using commercial solutions and technically those are repackaged into VoD files after the event. Since we are very proud of what we have achieved up to this point, not to mention hyped about where we are headed, we figured it was about time to shed some light on what we have been up to since March 2018.

But first things first!

SVT Encore is being released as Open Source as of today (5/5/2021), something that we are very, very excited about. Even though it is an unsupported alpha (think of it as version 0.1), we have been using this very piece of software in production for the better part of two years. We hope that others will find Encore useful and beneficial to their transcoding needs, and that new collaborations will spring from the project itself. So if you want to get started with Encore right away, have a look at our GitHub repo: https://github.com/svt/encore

To really bring the point of our excitement home I have asked my colleague and team mate Josef Andersson, who works as our Open Source Lead, to describe the reasoning behind the Open Source decision. This is very much the main news of the day, and the point we want to put across.

Why Encore is going open source

So why are we releasing Encore as Open Source, you might ask? Here are three reasons:

Collaboration
Innovation
Transparency

Now, traditionally, software was built in-house, used in-house, and discarded in-house. But since many years, all that has changed. From being something mysterious, to something everyone takes for granted, Open Source is eating the world. There is not one single digital project of size that is not dependent on Open Source, and probably not many small ones either. We, as most others, use, and depend heavily on it, in all our IT-based projects, every day, every year. Thousands of projects , thousands of contributing people and organisations, becoming part of our little projects, our code, so necessary for our daily deliveries and services to the Public.

In these times, Public Service and the whole media landscape is changing, and going through transformations. With the ongoing digital transformation, many of our colleagues and friends in the media industry — be it Public Broadcasters or commercial actors alike — are putting resources into solving the same problems as us, and meeting the same challenges. We are not alone, of that we can be sure. One of these problems is how to do the Video transcoding part. At one point, we have to ask ourselves — why should we, and you re-invent the wheel again and again, solving the same problems, wasting resources on building identical solutions as our media neighbours, by continuing building closed software. Collaboration makes us all stronger and fosters innovation. Open Source IS a proven method of collaboration. If we collaborate, we can spend our time building new never seen features and innovate instead of re-implementing the same function, again and again and again, and our resources can be put to where they should be — to give the public the best service we possibly can.

With collaboration comes more innovation — new ideas and suggestions that we would never have thought of in a silo development environment. Sure, not everything will end up in our project, but even ideas that are not used, will be food for thought. And the ones that are, will be invaluable.

Last but not least, A public organisation working for the public’s best, have an obligation to strive for transparency in their work and efforts. It is an important democratic issue, and it is a trust issue. From the technical level, Open Source is a way to provide this. It shows what an organisation is doing with the resources given and how the results can reused and improved. With transparency through Open Source we hopefully build positive commitment in the technical communities, and with that follows that we attract talent to work with us.

Yes, we admit that we are quite new to doing Open Source. This project is young, and will change in many ways before it can be considered very mature. You will find errors, and warts — in the code, in our process or somewhere else. Bear with us, we are learning, and with every release, with experience we will improve. We sincerely hope this project can benefit you, and who knows. maybe we even can improve it together on a Meetup, Hackathon or a conference in the future.

The story thus far

So, it all began back in 2018 with the task of converting the PoC to an actual production-grade application, and it turns out that the project started rather slowly. Staying true to its namesake Encore spend the better half of a year being re-iterated into new conceptual incarnations, meaning that we continuously came up with new ideas for its usage and architecture. One of the biggest questions early on revolved around the handling of on-prem hardware, scalability, deployment and orchestration.

In other words, the first half of 2018 was spent determining the scope and purpose of Encore.

During the second half of 2018 we also decided, due to external circumstances, to build our own packager tool (which is based on Shaka Packager) for HLS and DASH. That particular journey, filled with the usual bumps and highlights is well deserving of its own blog post, and thus best left to another day. Suffice it to say that the flexibility our packager now offers has opened up a new world of possibilities for our VOD streaming capabilities. Although this development put Encore on hold for another six months, in hindsight it has become very obvious that we are currently reaping the benefits of working with a set of self-developed tools.

Then finally, during the first half of 2019 we began developing the real deal, the actual production-level Encore. After a couple of months we had completed a stable and functional transcoder (which had all the features of the initial PoC and more), and slowly but steadily we began the process of migrating transcoding workflows to Encore.

What is SVT Encore

In this blog post I am making the presumptuous assumption that the reader has no previous knowledge of Encore, so we might as well start off with the most obvious question:

What is Encore?

“Encore — Meaning” by SVT / CC-BY-SA 4.0

Encore (Officially SVT ENCORE) is an automated transcoding solution, developed at SVT, that transcodes media files for online OTT distribution on SVT Play. It is a complete, hardware agnostic software service that utilizes a combination of Open Source and in-house developed tools. Most of the actual transcoding is done through the ubiquitous FFmpeg framework, which in turn harness the power of various libraries for dedicated tasks. This is one of the many areas where the inherent flexibility of Open Source and the modular design of Encore itself allows us to precisely achieve our desired outcome.

So what does Encore actually do?

Encore is able to analyse, filter, rearrange, ingest, decode, transcode, mux, and deliver digital media files. Even though that summary might seem simple enough, let me elaborate on what the above statement actually entail.

In its current form, Encore has the ability to import virtually any media file, regardless of inherent stream layout, and transcode it into almost any format we want. Encore is also able to analyse the streams of a given media file and adaptively apply filter operations, such as deinterlacing, graphics, captions / subtitles, conversions of pixel formats and scaling, as needed. Similarly Encore has been designed to handle a wide array of audio stream setups, and cleverly rearrange or down-mix the various variations into formats better suited for our VoD streaming. While some of these features already exist inherently within FFmpeg, we are also able to add our own filters and analysing without much hassle.

The beauty of this setup is that if we find an Open Source / royalty free piece of software that does any of the above operations in a better way than whatever we are currently using, we can simply choose to implement the newly found piece instead. It is hard to accurately describe how powerful this flexibility really is, but in a sense almost every single tool that Encore makes use of can be changed or continuously improved without major development overhauls or excessive maintenance.

Encore is a mixture of in-house developed software and Open Source libraries | “Encore — At a Glance” by SVT / CC-BY-SA 4.0

For instance, the ability to write our own subtitle filter that is able to decode, render and encode subtitles (captions if we are being picky) allows us to make proper use of our custom captions format (developed by our SubText-team). All we have to do is write a filter that conforms to our filter proxy (link), decode and read an AVFrame, then render the captions on top of that and finally add it in our filter graph during the transcode. This would be very hard to achieve with a commercial transcoding solution for many reasons, the most obvious one being the fact that no other company would have any idea of how our custom subtitle format works.

Similarly, the usage of FFmpeg allows us to freely choose among available Open Source video and audio codecs, design our own parameters and utility files and create encoding profiles with tailor made settings.

Workflow Overview

Below is a simplified workflow overview of how Encore operates conceptually. Encore processes each transcode through a series of five steps, decreasing the level of abstraction with every step. Now, in practice some of these steps are in turn affected by surrounding microservices as well as internal and external business logic from other on-prem systems. But for the context of this blogpost we are going to ignore all outside influence since Encore itself functionally acts as if these processes exist in a vacuum. This in and of itself is very much a design choice, because even though it does put requirements on the surrounding digital infrastructure, it also makes encore adaptable enough to fit into any production pipeline.

Every transcode that Encore performs begins with an Encore-Job. The Job contains four key pieces of information, first and foremost the media file that is to be transcoded, secondly what sort of priority that particular transcode has, thirdly which predefined transcoding profile Encore should use and finally specific instructions that might not be presented in any of the other three (for example how to handle strange aspect ratios or audio channel layouts). Once the Job has been created, it is sent to the Queue.

Queue

The Queue-step is where priority, scaling and concurrency for Jobs are handled. In essence a user can define one or more Queues which in turn handle different levels of priority. For instance in our case we use three Queues, one for high priority (clips and express transcodes), one for standard priority (regular programs) and finally one for low priority (extra refinement transcodes with long processing time). However, a Queue can also handle different levels of priority among its list of jobs, that is to say within the high priority Queue one can imagine having different levels of priority as well (news clips being more important than regular clips). Once there is a transcoding thread slot available in an Encore Instance, a Job is picked from the Queue and forwarded to processing.

This particular feature becomes really powerful once you start using it in order to scale. Since you can have just about as many Queues as you would like, and they in turn can have an arbitrary level of internal priority handling, a user can choose to extend this setup across as many instances of Encore as hardware will allow. Simply put:

The Encore instances form a self-regulating cluster of prioritised, concurrency-handling transcoding jobs.

Job Processing

As mentioned earlier, an Encore Job is an aggregate of information needed to perform a transcoding processes. The Job contains a transcoding profile that includes choice of codec, bitrate, parameters and so on. There is also an option to include special instructions alongside the profile, that might not be directly related encoding options, but that nonetheless will affect the transcode (such as conversions from SAR to DAR). To make sure that the profile, and optional instructions can be applied correctly, the media file itself has to be analysed, which is done by the Analyser.

The Command Builder combines information from several sources and creates one FFmpeg command | “Encore — Process Overview by SVT / CC-BY-SA 4.0

True to its namesake, the Analyser probes the media file by reading through various headers and sifting through the first 10–15 seconds of every stream. The Analyser establishes what sort of media Encore is dealing with, video as well as audio, together with relevant metadata in order to summarise useful information and measured metrics. Although this might sound easy enough, anyone with previous experiences of transcoder development will tell you that this is actually one of the more complex tasks to accomplish in a sufficiently satisfying manner. This becomes especially true whenever you are dealing with older media formats, which were more often than not made for a different time period with hardware eco-systems that were mainly analogue.

Thus, the Analyser allows us to correctly interpret the various properties of a media file, such as codec, resolution, format, color primaries, storage aspect ratio, display aspect ratio, pixel aspect ratio, timecode, timestamps, channel layout, and so on. This in turn is summarised into a neat JSON-formatted metrics file, which is sent together with the actual media file and encoding profile into the CB, the Command Builder.

The CB, which is truly at the heart of Encore, is where most of the transcoding magic happens. This is where the chosen encoding profile is combined with the analysed metadata to form the final FFmpeg command which will transcode the original input media file into the desired output. Usually, but not always, said output is actually a set of several transcoded media files with different bitrates and varying compression complexity. These are then used to create bitrate ladders for online streaming. While the CB is designed to be a sort of “it just works” solution, it is far from perfect and much like the rest of the project it has enormous potential for improvement. Due to this very potential, the CB has been and will probably remain, one of the main focus points of future Encore development.

In practice the CB cleverly creates a command that maps the input to our desired outputs through a massive filter graph that is programmatically created. This means that the CB, together with the measured metrics file from the Analyser and metadata supplied from our MAM, automagically solves many of the practical questions an operator might ask, such as:

- What is the input picture like? Storage-, Display- and Pixel Aspect Ratio?

- What sort of content are we dealing with?

- Do we need to scale the video, and if so what algorithm do we use?

- What are the inherent colour primaries, encoding models, and how do we need to transform them?

- How are the audio channels distributed and mastered? What do we want to keep and what do we want to change?

Having answered these questions, and better yet solved them, the CB sends the FFmpeg commands to the Progress-step for transcoding.

Progress

Once the command has been successfully constructed, having long lost any notion of being human readable, it is time to start the actual transcoding. The transcoding of a Job is initialised, the Encore instance proceeds with the execution and continuously reports on the progress. In our case, all of these number crunching operations are run on custom servers which are in turn handled by our in-house cloud, but more on that below. When the transcoding process is complete, the output files are sent to a predefined output folder.

Deployment, Hardware and Orchestration

In order to get Encore running in production however, we required a proper setup of hardware and workflow orchestration. Several questions were important in this regard: Who manages the hardware? Who updates the servers? Who replaces broken components? Who plans for general lifecycle management? Since the team working on Encore wanted to focus on software development, and thus stay away from the vicissitudes of hardware management, we would need help. In most cases, someone in our position would turn to a public cloud service, or simply outsource the problem to a contractor. But in our case, we did what we usually do, turn to our in-house solutions!

You see, over the last six or so years SVT has been building its own in-house cloud which runs thousands of containers, mostly microservices, that are all on physical on-prem hardware. It powers a large chunk of all SVT’s digital services and even an early PoC of Encore used this environment. As one might imagine, it was obvious early that it would make sense to use dedicated servers for Encore within our own cloud.

The in-house cloud itself has a lot of tools in place already, alongside metrics and monitoring. Naturally, it made perfect sense to involve the people managing the in-house cloud and work together to design the Encore servers. The hardware choices where normalised to be a close fit with existing servers so that they would still be usable for other tasks at EoL.

Thankfully, installing these servers required a minimal amount of effort. Thanks to the elegant design of said cloud our skilled team managed to adapt the dedicated Encore servers in record time. We were able to physically unpack, rack, PXE boot & provision (all automatic, of course) on the same day as the hardware arrived. We then spent the better part of a week adapting said servers and tweak tooling to make them exclusively available for Encore.

Since then, it has been a huge help to be able to work together closely to track down performance bottlenecks, share knowledge or just talk about something important (like the choice of operating system) over a cup of coffee.

To quote my colleague Stefan Berggren, who at the time worked as the product owner of the in-house cloud team.

The Encore servers, just like the rest of the in-house cloud uses ordinary commercially available servers. We like to make things as cost efficient as possible, no need to waste money. The Encore servers are mainly CPU optimized with the reasonable priced Intel Xeon 6154 CPU. We choose 6154 because it had a nice 3 GHz base frequency with 18 cores, while not too expensive, a good compromise. With two of these in each server, we are able to chew through plenty of raw videos files in record times. The rest of the hardware is boring, and not that important, 192 GB RAM with a somewhat small mirror of fast SSD:s for storage (the actual video files are not stored on the servers).

Like Encore, the internal cloud is not a static thing, we regularly change, adapt and tweak it to better fit our needs. System containers allows us to pack applications efficiently and cheaply compared to classic virtualisation. Most applications (like Encore) runs inside stateless Docker containers in a microservice orchestration framework that is then used by most of the development teams.

Conclusion

Encore has been, and continues to be, a truly exciting and innovative project. In the coming years we will most likely keep the current pace of development and improve upon the solution even more. Thus, I am very optimistic for the future of digital video at SVT Play, and it amazing to be working at the forefront of our video transcoding endeavours.