DVB bitmap subtitles processing at Zattoo

Milos Pesic
May 11, 2020 · 8 min read

Subtitles are a very important component in delivering a rich multimedia experience, not only because they allow us to consume the content in different languages, but also because they help those with hearing impairments (~ 5% of the audience) to consume the media content. According to some figures, more than 80% of Facebook videos are viewed without sound. For the ultimate experience Zattoo is trying to deliver, subtitles are not an option; they are a necessity.

Text-based subtitles

In some cases subtitle streams are given in the form of text with presentation timing metadata. An example of these are SubRip text format (SRT):

100:00:00,498 --> 00:00:02,827- Here's what I love mostabout food and diet.200:00:02,827 --> 00:00:06,383We all eat several times a day,and we're totally in charge300:00:06,383 --> 00:00:09,427of what goes on our plateand what stays off.

A more or less similar concept is the foundation for other text-based subtitle formats, VTT (Video Text Tracking), TTML (Timed Text Markup Language). Textual content is transferred over wire and on a receiver (player) side — the text could be easily extracted from the stream. This makes them a very popular choice for delivering multimedia content over adaptive HTTP-based streaming protocols (DASH and HLS). This combination is widely supported on the vast majority of popular devices and set top boxes.

DVB bitmap subtitles

Not all channels provide subtitles in a textual form, however. Another quite popular format, especially in Europe, is the DVB bitmap subtitling format. Instead of a textual representation, a subtitle entry of this type is represented as a graphical bitmap (image) with timing and screen positioning metadata. Players receiving these images overlay them on top of the video at the time and on the screen position indicated by associated metadata.

The whole chain is text agnostic and as such, is convenient for a couple of reasons:

However there are some very limiting concerns when it comes to DVB bitmap subtitles

DVB bitmap subtitles at Zattoo

Many European channel providers on our platform are still using this subtitling format and due to the constraints mentioned above, we were until recently unable to carry these subtitles to our end users. However, since at Zattoo, we aim to provide the best possible TV experience and have invested some work in bridging the gap and making the subtitles available.

For a long time, our real time system for recording streams from satellites and making them ready for delivery over HTTP adaptive streaming protocols was simply discarding DVB bitmap subtitle streams due to limited ability to carry them over the aforementioned streaming protocols.

Our goal was to keep using HTTP adaptive streaming protocols and to internally transform DVB bitmap subtitles to textual form that is supported on every client side device and could be carried over these protocols.

To achieve efficient, accurate and real time text extraction from DVB bitmaps (images), we have decided to use open source OCR (optical character recognition) engine Tesseract, which is the de facto industry standard for reliable text recognition. Hewlett Packard has initially developed the engine (started in 1985), open sourced it later and since 2006, further development has been sponsored by Google. It is an AI solution based on neural networks with pre-trained models for a lot of languages. The engine has been trained for each language and there is a ready, pre-trained model that could be used for text recognition. More details on tesseract architecture can be found here.

Implementation findings

During implementation, we came across a couple of interesting findings.

C++ Tesseract API we used is quite simple and easy to use

To initialize the engine path to the trained data files, language and engine mode should be specified.

#include <tesseract/baseapi.h>auto tessAPI = std::make_unique<tesseract::TessBaseAPI>();auto res = tessAPI->Init(trainedDataPath.c_str(), "fra", tesseract::OEM_TESSERACT_ONLY);

There are 2 main conceptual engine modes:

We have experimented with both modes and found no significant difference in recognition rate between these two modes for DVB subtitle images.

Once the object has been initialized, it is ready to be used for text extraction. Assuming that the DVB bitmap subtitle image data has been extracted from a stream (e.g by using ffpmeg API: avcodec_decode_subtitle2) and the image in RGB24 format with a given height and width is in a buffer:

std::vector<uint8_t> imageBuffer;

The following API could be used to extract the text:

char *res = tessAPI->TesseractRect(
imageBuffer.data(), // pointer to the beginning of the image
3, // bytes per pixel
width * 3, // bytes per line
0, // left
0, // top
width,
height);

The result is a string containing extracted subtitle text.

The OCR engine produces results in real time with sub 200ms latency on the 90th percentile, which enables us to use the transformation process in our pipeline for delivering live content.

Improving recognition accuracy

At Zattoo, we were in a position to observe the text recognition process for a high number of channels continuously, constantly transforming subtitle images to textual format, analyzing them and applying some techniques to get the best possible accuracy (> 95%). Some of these techniques are described below.

Truncating background

When a subtitle text represents a relatively small portion of the image, we have observed frequent failures in text extraction. An example of an image which would frequently result in text recognition failure:

To solve this problem, we experimented with trimming unnecessary background noise before submitting the image buffer to the OCR engine:

As a result we have achieved reliable recognition in 100% of these cases.

Inverting colors

Although the recognition rate was quite good, we wanted to try to further improve it and an idea which proved to be successful was born while looking into how the usual subtitle image appears (black background with white or colored text). Recognition rate in LSMT mode was ~ 20% worse compared to legacy tesseract mode in these cases.

We knew that OCR engine neural network has been mostly trained on text where the background is white and text is black or colored — we therefore decided to invert colors of the DVB subtitle image before asking OCR to extract the text:

This simple and cheap technique brought about an even higher recognition success rate, especially in LSTM mode — which ended up having the same recognition rate as the legacy mode, after applying the transformation.

Reducing number of color components from 3 to 2

We started observing relatively low success rate for a case in which we have 3 dominant color components in the image, e.g:

As you can see, in this example, we have 3 different colors for background, text border and text itself. During experimentation we found a relatively simple technique to increase the accuracy significantly. In all these cases, we observed that the text border color is black, so we came to an idea to change the background color to black instead and effectively reduce the number of dominant color components to 2 , e.g:

Finally, after applying the inverting colors technique, the image that could be submitted to tesseract engine would appear as follows:

The final result is complete subtitle coverage on all channels streamed over Zattoo.

Examples

A sample with input DVB subtitle images and recognized text on some of our channels:

Mise en orbite !

Bonjour, docteur.

Ca fait un bail.

Vous détenez quelque chose

qui m’appartient.

On veut rentrer chez nous.

- Venez vite !

- Rattrapez-le !

If you want to try it out, join Zattoo, tune into a channel, turn on subtitles and enjoy.

References:

Zattoo’s Tech Blog

Learn more about the technological developments at Zattoo…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store