Seeing is believing: introducing enhanced video intelligibility

Fredrik Lundkvist
The SVT Tech Blog
Published in
3 min readApr 1, 2022

EDIT: This was an april fool’s joke that was perhaps executed a little too well.
At SVT, we care about accessibility, and making our content available for everyone in Sweden. To this end, we’ve previously launched technical enhancements such as ”tydligare tal” (dialogue enhancement), and are actively working on more features to improve the video experience of our viewers.

Today, the members of the Video Core team are pleased to announce the next step on our journey towards a streaming service for everyone: we call it ”tydligare video”.

In times such as the ones we’re all living in right now, it’s common to feel mentally exhausted; and in those situations, it might be hard to focus on the small details in a show that might be crucial to the plot, such as a loaded gun mounted above the fireplace being a possible murder weapon in your favourite detective drama, or a small facial movement indicating that a crew member on the space ship is an impostor in your favourite sci-fi epic. In order to help our viewers notice such details, we’ve come up with ”tydligare video”; it’s one of those terms that is a bit hard to translate to English without losing the meaning, but we think “enhanced video intelligibility” does the trick; that’s quite a mouthful though (and not trademarked by us yet), so we’ll stick with “tydligare video” in this blog post.

To ensure that our viewers won’t miss those all-important details, we’ve used the awesome power of deep learning. By training our model on decades of video from one of the largest media archives in Europe — laboriously annotated by hordes of interns– during the last three months, we’ve managed to build a system that always knows which part of the image is the most relevant to the plot, and magnifies it to ensure the viewer doesn’t miss anything of importance.

An illustration of how “tydligare video” works

The key to creating such a model is obviously attention, and the model we’re using for tydligare video is based on groundbreaking attention-based models such as YOLO and GPT.

For those who are interested in the details, we provide an architectural diagram:

Illustration of the “tydligare video” neural network

As always when working with accessibility features, we hope that tydligare video will be beneficial not only to those who are hard of noticing details, but to all viewers. We are currently trialing the feature before rolling it out for all titles, if you’re interested in trying it out for yourself (after all, seeing is believing), take a look here! And if you run into any problems swapping between the video tracks you can always try the reference-player here.

--

--