The Netflix HERMES Test: Quality Subtitling at Scale
Since Netflix launched globally, the scale of our localization efforts has increased dramatically. It’s hard to believe that just 5 years ago, we only supported English, Spanish and Portuguese. Now we’ve surpassed 20 languages — including languages like Korean, Chinese, Arabic and Polish — and that number continues to grow. Our desire to delight members in “their” language, while staying true to creative intent and mindful of cultural nuances is important to ensure quality. It’s also fueling a need to rapidly add great talent who can help provide top-notch translations for our global members across all of these languages.
The need for localization quality at an increasing scale inspired us to build and launch HERMES, the first online subtitling and translation test and indexing system by a major content creator. Before now, there was no standard test for media translation professionals, even though their work touches millions of people’s lives on a daily basis. There is no common registration through a professional organisation which captures the total number of professional media translators worldwide, no license numbers, accreditations, or databases for qualified professionals. For instance, the number of working, professional Dutch subtitlers is estimated to be about 100–150 individuals worldwide. We know this through market research Netflix conducted during our launch in the Netherlands several years ago, but this is a very anecdotal “guesstimate” and the actual number remains unknown to the industry.
Resourcing Quality at Scale
In the absence of a common registration scheme and standardized test, how do you find the best resources to do quality media translation? Netflix does this by relying on third parties to source and manage localization efforts for our content. But even this method often lacks the precision needed to drive constant improvement and innovation in the media translation space. Each of these vendors recruit, qualify and measure their subcontractors (translators) differently, so it’s nearly impossible for Netflix to maintain a standard across all of them to ensure constant quality at a reliability and scale we need to support our constant international growth. We can measure the company’s success through metrics like rejection rates, on-time rates, etc., but we can’t measure the individual. This is like trying to win the World Cup in soccer and only being able to look at your team’s win/loss record, not knowing how many errors your players are making, blindly creating lineups without scoring averages and not having any idea how big your roster is for the next game. It’s difficult and frustrating to try to “win” in this environment, yet this is largely how Netflix has had to operate in the localization space for the last few years, while still trying to drive improvement and quality.
HERMES is emblematic of Hollywood meets Silicon Valley at Netflix, and was developed internally by the Content Localization and Media Engineering teams, with collaboration from renowned academics in the media translation space to create this five part test for subtitlers. The test is designed to be highly scalable and consists of thousands of randomized combinations of questions so that no two tests should be the same. The rounds consist of multiple choice questions given at a specifically timed pace, designed to test the candidate’s ability to:
- Understand English
- Translate idiomatic phrases into their target language
- Identify both linguistic and technical errors
- Subtitle proficiently
Idioms are expressions that are often times specific to a certain language (“you’re on a roll”, “he bought the farm”) and can be a tough challenge to translate into other languages. There are approximately 4,000 idioms in the English language and being able to translate them in a culturally accurate way is critical to preserving the creative intent for a piece of content. Here’s an example from the HERMES test for translating English idioms into Norwegian:
Upon completion, Netflix will have a good idea of the candidate’s skill level and can use this information to match projects with high quality language resources. The real long term value of the HERMES platform is in the issuance of HERMES numbers (H-humbers). This unique identifier is issued to each applicant upon sign-up for the test and will stick with them for the remainder of their career supplying translation services to Netflix. By looking at the quantity of H-Numbers in a given language, Netflix can start to more precisely estimate the size of the potential resource pool for a given language and better project our time needed to localize libraries. Starting this summer, all subtitles delivered to Netflix will be required to have a valid H-Number tied to it. This will allow Netflix to better correlate the metrics associated with a given translation to the individual who did the work.
Over time, we’ll be able to use these metrics in concert with other innovations to “recommend” the best subtitler for specific work based on their past performance to Netflix. Much like we recommend titles to our members, we aim to match our subtitlers in a similar way. Perhaps they consider themselves a horror aficionado, but they excel at subtitling romantic comedies — theoretically, we can make this match so they’re able to do their best quality work.
Since we unveiled our new HERMES tool two weeks ago, thousands of candidates around the world have already completed the test, covering all represented languages. This is incredible to us because of the impact it will ultimately have on our members as we focus on continually improving the quality of the subtitles on the service. We’re quickly approaching an inflection point where English won’t be the primary viewing experience on Netflix, and HERMES allows us to better vet the individuals doing this very important work so members can enjoy their favorite TV shows and movies in their language.
By Chris Fetner and Denny Sheehan