Image for post
Image for post

Realtime Translated Subtitles

Written by Saidusmon Oripov and Tarek Madany Mamlouk

This November, Axel Springer held its first fully virtual tech-conference. We are an English speaking company but with our headquarters in Germany most of the conference’s participants were Germans. The smaller sessions were organized in MS Teams, where you can enable real-time subtitles. This can be really helpful if you are struggling with the spoken language but it would even be better if the live generated subtitles were directly translated into your preferred language. We don’t have that? Let’s build it!

Choosing the right Technology

There are currently three popular types of machine translation systems in the market: neural, statistical, and rules-based. Over the past few years, big technology companies like Google, Amazon, Microsoft, Facebook, and IBM have been transitioning from old-fashioned phrase-based statistical machine translations to neural machine translations. The main reason is that the new technology started to show better translation accuracy performance. And according to a study done by Tilde, a neural machine translation system handles word ordering and morphology, syntax, and agreements up to five times better, respectively, than the statistical machine translation system.

Image for post
Image for post
Visualization of data from tilde.com

Hard-to-translate content like acronyms, jargon, slang, industry terminology, and cultural differences are critical for getting accurate translations, and it remains a big challenge for machine translation. However, rapid advances in machine intelligence have improved the recognition of speech and image capabilities, continuing to drive up quality. And they are increasingly getting employed in diverse business areas, introducing new applications and enhanced machine-learning models. Large organizations are moving to machine learning to augment their workloads to make their content more accessible faster than it would be possible without automation.

We decided for now to implement our solution on the basis of Google’s new Media Translation API. This service is currently in its beta-phase and offers therefore limited support. Since this is a prototype, working with the beta-release was totally fine.

Google’s approach for Media Translation uses bidirectional streaming RPCs for moving data between client and server and vice versa. Both streams act independently so that the server can decide to answer requests immediately or wait for enough information before sending a consolidated response.

In gRPC, the client can set a timeout for the completion of its calls. Defining timeouts is language-specific and might require setting a duration for the call or a fixed point in time as a deadline. Termination of calls can happen independently on each side without any necessary dependency on the outcome of the other stream.

In our case, we implemented a slim client in Node.js based on Google’s media translation SDK. The hardest part was actually getting access to the device’s microphone via Sox. While this worked on one device without any problems, other devices ran into problems.

Image for post
Image for post
Components of our real-time translation overlay

Building a Prototype for Production

Image for post
Image for post
Display German subtitles in real-time while speaking English

Integration into our talks was easy because we managed our stream via OBS. For a clean overlay on OBS, which shows the subtitles’ always updated status while the speaker is speaking, we wrote a small application in React with Event Source. This way, our client subscribes to the updates on the Node.js server and refreshes the display momentarily. The chroma-key filter in OBS allows us to generate a transparent overlay on top of our video so that the viewer can see the subtitles simultaneously while the speaker is on the stage.

Do we need this?

As digitalization becomes more widespread in various industries, the demand for automated machine translation market will increase. Also, we can expect that the ongoing pandemic will have a positive impact on the market. In 2019 the machine translation market was valued at USD 550 million, and it’s expected to reach USD 1.5 billion by 2026 (see marketwatch.com and prnewswire.com, both agree on estimated market valuation).

Image for post
Image for post
Visualization of data from prnewswire.com

It’s easy to presume that the world is becoming more and more fragmented, e.g., trade wars, tariffs, populism. But in-spite-of the backlash, the corporate world has never been so connected. The largest organizations worldwide embracing a more significant push for globalization and expanding their services internationally because they see an increasing value in delivering their products and content globally. That being said, becoming a genuinely global business brings its own challenges. Companies that fail the digital transformation can’t keep pace with globalization, and they are losing competitiveness.

So the question is not if we need this but rather if we can afford NOT to use this.

Where do we go from here?

I personally see this implementation as successful because it proves that software can provide some kind of real-time translation for helping people understand the spoken word. What I question is the way the subtitles are displayed. There was too much text pouring through the screen for people to follow structurally. A next iteration could be condensing the real-time translations into keywords or abbreviated sentences to give the viewer context and partial translations, assuming that he has a basic understanding of the given talk. This experiment includes a design element (displaying the text in chunks that are easy to read) and a language processing element (extract a semantic excerpt from a sentence). If we can manage that in real-time, then we have a unique and stylish solution for a very old problem.

Axel Springer Tech

Tech and product folks writing about their work across the…

Tarek Madany Mamlouk

Written by

Innovation Engineer at ideas-engineering.io, Software Developer, Sci-Fi Geek

Axel Springer Tech

Tech and product folks writing about their work across the Axel Springer group: https://www.axelspringer.com/

Tarek Madany Mamlouk

Written by

Innovation Engineer at ideas-engineering.io, Software Developer, Sci-Fi Geek

Axel Springer Tech

Tech and product folks writing about their work across the Axel Springer group: https://www.axelspringer.com/

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store