Multi-Language Videogame Localization — a Quest for Quality

Demid Tishin, CEO, All Correct, dtishin@allcorrectgames.com.

I would like to thank my colleague Natalie Novikova for invaluable details and examples, without which this article would not be possible.

It was a chance we could not miss.

By 2011 All Correct was essentially an SLV, translating a wide range of content from or into Russian, with a focus on huge Chinese and Korean online role-playing games for the Russian players. Suddenly it all changed — mobile and casual games boomed, while MMORPG market got saturated and stabilized. Hundreds of new mobile developers and publishers appeared, often with little expertise in internationalization and no staff to manage localization in-house, let alone to manage each target language separately. The new client typically needed only one or two providers to handle a whole line of game titles, as well as regular updates and marketing phrases, simultaneously into 10+ languages, for players in Americas, Europe and Asia. For All Correct it was an opportunity, but it was also a challenge, as we had only a vague idea how to manage quality in foreign language combinations.

For Client X we translated 25 mobile titles in 2011, some of them into 14 languages. This required a total of 89 translators, all of whom we found in online databases. Initially, the vendor managers had no testing procedure and applied a few basic filters: the translator must translate only into their mother tongue, reside in the country the target language, and have a positive public track record (e.g. ProZ.com Willingness to Work Again rating). The workflow was simplistic, too — the project manager selected a translator based on their videogame portfolio, the translator delivered an Excel file, the PM did a quick formal review, and delivered the software strings to the customer. In-app localization testing was done on the client side. Not only our quality control, but also the client’s internationalization process was immature — most loc kits didn’t provide any context for software strings — even though many titles were hidden object games, where a difference between a (long)bow and a bow(tie) made a critical difference.

In other words, we had it coming.

In October 2011 the publisher received negative feedback from the gaming community on some of our German and French localizations. Complaints on Chinese and Brazilian Portuguese followed in 2012. There was a whole range of errors — language, style, consistency and accuracy. A few times players referred to localization as raw machine translation output. We admitted the fault and started on a solution.

First of all, we put a question mark on every translator, dramatically expanded the translation team and totally re-translated all problematic content.
  • To filter out unprofessional vendors we launched a massive cross-check campaign. We devised a competence assessment form (Fig. 1) and organized peer checks by two or three translators for every translated chunk of content. The form allowed us to assess 6 translation competences separately (subject matter expertise, understanding source language, proficiency in target language, style / literary competence, regional standards and compliance with instructions / procedures) on a 1 to 5 scale and provide error examples and overall recommendations. Each form was analyzed by the project manager or vendor manager for validity and then imported into our vendor database.
  • As a result, only 24 people from the 2011 team (27%) remained into 2013, while 73% of the original team were discarded as unprofessional. One of the vendors who didn’t pass the test was someone Stefan Jacob (a respected German translator, according to his CV), who finally was pinned down as an impostor with the real name Heba Qudaih, living quite a thousand mile from Germany.
  • 10 German, 10 French, 17 Chinese and 25 Brazilian Portuguese localizations were re-delivered and fixed with urgent localization patches. This stabilized the situation with quality claims and stopped further damage to the publisher’s image. Needless to say, no “machine translation” complaints have been received ever since.
Fig. 1 Sample competence assessment form
Secondly, we introduced a testing procedure for new vendors.
  • Every translator candidate is given a test job that is checked by a regular translation team member; a competence assessment form is completed and checked for validity by the PM. Some red flags for invalid assessment include competence marks are lower than 5 but no error examples are given, or target language competence is rated low, while in fact only style issues have been detected.
  • Next, the vendor is given a pilot (real) job, which is also checked, and this time not only a competence assessment form, but also a quality check form is filled out (Fig. 2). Both forms are also checked by the PM for validity.
  • If no major errors are detected in the translation job, the candidate becomes a regular team member, but competence assessment does not end here — every 5th job of 1000 words or more is peer-reviewed, a competence assessment form is filed in the database, and the vendor’s current rating updated. This way, 170 new translators and quality reviewers were added to the team by 2014.
  • The vendor managers keep an eye on the grades and inform the PMs in case of any significant grade changes for familiar translators.
  • To reward talent and have more options, the PMs give every 100th translation job to a fresh translator who has only passed an entrance test.
  • To achieve a satisfactory level of reliability for competence assessment (i.e. for assessment results to be more or less reproducible), we prepared concise guidelines, as well as organized and recorded a webinar for all quality reviewers.
  • Head of Production checks on a monthly basis if any PM has assigned a job to translators with low grades, and follows up accordingly.
Thirdly, we overhauled the localization workflow.
  • Before assigning a job to an unfamiliar translator, the PM checks their target language and subject matter grades to be 4 and higher.
  • All gaming translations are now done in a server-based translation environment, which greatly increases terminological and stylistic consistency.
  • Besides translation proper, every translator submits a glossary update, which is peer-checked according to a special checklist before being added to the main project glossary. This also lowers the chance of terminology defects.
  • Every translation job of 1500 words and more is submitted to a peer translator for quality check, and the quality check form is reviewed by the PM. If the PM suspects invalid assessment, or if the quality reviewer has a short track record, a secondary check is performed. Ideally, quality review is performed before the loc kit is submitted to the publisher for integration with the build. If the deadline is tight, quality check can be performed after translation delivery (but before the game build is compiled), or in rare cases even after the first round of in-app localization testing, so that any corrections will be tested during regression testing iterations. This minimizes the risk of language, stylistic and accuracy defects. There is no standard sample size for quality review, but a time slot from 1 to 4 hours is allocated for quality check instead.
  • Error categories used in the localization quality check form strictly correspond to the QT LaunchPad MQM (Multidimensional Quality Metrics) Version 2, a European Commission-funded initiative (for details see http://www.qt21.eu/launchpad/content/delivered).
Fig. 2 Sample quality check form
  • Small translations are checked when their total wordcount for a single translator adds up to 1500 words.
  • After the client compiles the game build, our team of localization testers play the game and submit all localization-related bugs into a bug tracking system (JIRA, Redmine etc). At this stage we detect truncated strings, untranslated source text (that was added into the game on later development stages and had not been included into the loc kit), compilation errors (e.g. chunks of alien language as in Fig. 3), and word concord problems when translated strings have been put into context. Most of the bugs are fixed by the translation team, the corrected strings are re-submitted, and a regression testing iteration follows, until there are no defects.
Fig. 3 A chunk of alien language is detected during loctesting
Finally, we worked with our customers to improve their in-house software internationalization process.
  • In 2013, a loc kit without images became an extraordinary thing, compared to the 2011 practice.
  • Software builds are often provided for our team to get immersed in the game before working on translation.
  • Loc kits are often provided in a logically structured order.
  • Shared Q & A sheets are actively used by the development, translation and testing teams.

Some outcomes

As a result, the relative number of valid quality claims and translation withdrawals per videogame localization revenue dropped by 51% in 2013 (more than twice) compared to 2012. The relative number of translation accuracy errors dropped from 61% in 2012 to 29% in 2013, and typos, from 11% to 4% respectively. Also, root cause analysis of localization defects showed a decrease of errors due to poor workflow planning or vendor management from 20% in 2012 to 5% in 2013.

As for Client X, in 2013, we expanded our partnership, having translated content of 54 video game titles into as many as 28 target languages, compared to 25 titles and 14 languages in 2011.

Even though our multi-language video game localization has seen a dramatic improvement in quality over the last 3 years, still there are challenges to meet. We have organized a few webinars for our translation teams on MemoQ tips, terminology work, and translator competence assessment, but there are many other skills to develop, and some 36 training activities are planned for 2014.

Originally published at MultiLingual on June 2014