Of Whom, by Whom, and for Whom?

Published on TAUS Review Issue 1

The importance of the Asian market for translation and localization is relatively well known, as one may easily find a vast amount of market research statistics in terms of population, supply-and-demand, purchasing power parity, and the number of languages, among other things. Therefore, this review will provide a different perspective on the translation market by examining the “of whom, by whom, and for whom” for language business and technology in Asia, qualitatively.

Language business and technology in Asia depends on bilingual people. Crowd-translation pioneers, Gengo and Conyac, intriguingly both starting from Tokyo, Japan, invite multilingual speakers to translate content such as subtitles in order to introduce local culture to the world, and vice versa. The major difference between Gengo and Conyac is their quality assurance approach. The former evaluates a translator’s ability by exams, while the latter utilizes peer-review to build up the community. Recently, DuoLingo launched English learning courses in Japanese and Chinese. It will be interesting to observe how far this online education can reach towards its goals to cultivate translators and choose better works by voting. Flitto from South Korea is also worth noting for its social network mechanism, which integrates the incentives of the consumer with trade public relation activity’s rewards in return for translations. For instance, a Korean native speaker who is fluent in English can localize an American mobile phone app’s menu to Korean and get gift cards in return.

When it comes to crowd-translation, Gengo and Conyac both encourage customers to order their services via API. MemSource, a cloud-based translation project management and CAT platform, even partners up with Gengo and utilizes the crowd-translation API as a pre-translation service. This partnership has changed the conception that pre-translation always equals translation done by machine. To push the boundaries the of API even further, a viable direction on which to embark could be software as a service (SaaS). Instead of selling translation services or computer-assisted translation (CAT) software, SaaS in language business has begun exploring the potential of selling value-added products on-demand, with various technologies. For example, PIJIN just launched QR Translator to enable access to localized information by QR code, while NTT Docomo just integrated speech recognition and optical character recognition APIs to create an augmented reality of translation similar but not limited to WordLens and Waygo’s visual-only approaches. SoftBank Technology, on the other hand, is promoting FonTrans which contains the added perk of open web fonts with translation APIs for website localizations.

As for localizations, through the New York based Smartling, startups such as Jordanian’s Dakwak, OneSky from Hong Kong, and Japanese companies WOVN and Yaraku, Asian business can spread over either web or mobile channels. While Dakwak emphasizes the search engine optimization (SEO) ability and OneSky provides mobile app specific functions such as translation length limit, WOVN sticks to JavaScript one-liner solution like Tolq and Google. Yaraku is bringing translation management systems from professionals to ordinary businessmen. This particular angle towards ordinary people could relate to the latest projects by the creator of Moses machine translation (MT) toolkits, Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation (CASMACAT), and MateCat, although they are still quite research-oriented for the time being. The ultimate goal of MateCat resembles that of other companies mentioned above: to increase translators’ productivities by the help of computation and eventually to benefit the communication between speakers of different languages.

In terms of computation, or more specifically, computational linguistics, perhaps because of the linguistic distance between Japanese and Western languages, such as the viewpoint on SOV vs. SVO and agglutinative language vs. fusional language, Kyoto University still leads in the field of research exploring example-based machine translation (EBMT). EBMT matches and extends translation memory’s ability, and the border between it and statistical machine translation (SMT) is getting less and less distinct. For example, Baidu-I2R Research Centre in Singapore has just completed the most accurate Vietnamese-to-English linguistics-statistics hybrid MT system in the world. On the other hand, on the linguistic side, Nanyang Technology University, is concentrating on growing the resources of cross-language semantics and grammars, including Wordnet. Bahasa, Open Multilingual Wordnet, which are head-driven phrase structure grammar parsers for Japanese and Korean. Without losing the generality, readers are kindly invited to test their English parser on http://erg.delph-in.net/logon to get an idea of these systems.

At very least, these resources can help generate paraphrases in Asian languages and then indirectly increase the coverage of translation memory in the near future. Currently, only Microsoft provides a similar English paraphrase API. Concerning translation memory coverage, Minna no Hon’yaku and Tatoeba Project are nurturing their potential in terms of Asian languages to be the next TAUS or MyMemory in terms of Western languages. Of course, proprietary companies are also accumulating private translation memories, which could lead to the next question: how to find the right leverage to apply translation memory without leaking confidential information or violating intellectual property laws. In order to take a baby step towards the answer to this question, EBMT-alike technologies can take educated guesses on decomposing and recombining translation memories. To date, Déjà Vu and MemoQ have a feature called AutoAssemble and Fragment Assembly, respectively. It would be a reasonable next step to consider paraphrasing and assembly to enhance positive feedback to improve both precision and recall of translation memory.

A self-disclosure here is that the author, who works for Yaraku, is also working on this very topic concerning numbers and classifiers in particular. To compose a new translation out of a translation memory automatically could be difficult, even if the only difference between them is a number. For example, besides the plural form issue, one may translate “12” in a French sentence into “a dozen” in English due to fluency concerns, not to mention a classifier is usually required in Chinese. In software localization, a possible compromise is preparing templates like “next {number_goes_here} page(s)” and replacing variables in braces with numbers later. Despite the fact that this kind of template approach is not pretty, it creates certain possibilities so that one may deduce patterns from numeral/classifier phrases, and that those patterns could be in regular or context-free grammars in terms of Chomsky Hierarchy. In other words, a computational treatment is feasible. One may argue that numeral phrase treatments will not help much due to their low occurrences, but they can be crucial in many domains. On patent MT task at NTCIR-10 (the 10th conference of NII Testbeds and Communities for Information access Research) in 2013, BBN technologies, the first place winner of Chinese-English patent MT, reported that their numeral phrase treatments gained a better BLEU (Bilingual Evaluation Understudy) score efficiently.

Finally, for the dream of pursuing computational semantics, pragmatics, and the related concept of interlingual knowledge representation, not too long ago, the only application was the pivot language of MT. However, the current trend of deep learning has brought a certain kind of word sense disambiguation (disambiguating the sense of “bank” between stream bank and financial bank), back into the spotlight. For instance, since Google released the word2vec source code, and there have already been follow-ups experiments in Asian languages. For instance, one official example of word2vec is that the result of the vector operation “vector(king) — vector(man) + vector(woman)” is close to “vector(queen)” in English. As for in Japanese, turns out that one may easily group “唐揚 (karaage /KAH-rah-AH-ge/; a specific type of deep frying)” with “唐揚げ (karaage),” “から揚げ (karaage),” “空揚げ (karaage)” in terms of vector similarity without knowing they are the same kind of deep fried cuisine. With other similar words “揚げ (/AH-ge/; deep frying)” and “鶏肉 (chicken)” in the same group, it might further imply that the dish is usually limited to chickens. Imagine working on Japanese restaurant menu translation with word2vec, the previously confused Chinese characters and Hiragana could cross-reference each other and become useful suggestions to foreign tourists when getting an sense of what is in a meal.

Besides of the obvious demands of tourism-related information about restaurants, site-seeing spots, transportation, and accommodation, Asia presents other various opportunities. For joint training for ASEAN armies, translation is crucial, yet major machine translation providers, including Yandex from Russia, do not serve Burmese. The National Electronics and Computer Technology Center of Thailand then funded a Network-based ASEAN Language Translation Public Service to fill this gap. As unfortunate as it sounds, other funding sources could come from Australia or the U.S., due to the stereotype that terrorism is planned in Arabic, Malay, Indonesian, and so on. Another stereotype would be serving typical out-sourcing countries like India, where surprisingly, English does not suffice for team building for development or for customer service. International pharmaceutical companies in Singapore want drug names localized, hence there is Special Interest Group on Transliteration & Transcription (SIG-T&T) under Asian Federation of Natural Language Processing (AFNLP).

Meanwhile, Ginger, OKpanda, and WritePath are focused on English skills for Asians. Ginger helps people write better English articles by employing a grammar checker and other natural language processing tools in the freemium plan, and interestingly, only the Japanese version of its website is different from others. This could be a hint to other language businesses that the Japanese market could be a special target. Unlike Ginger’s pure machine approach to writing assistance, WritePath is focused on professional proofreading of essay/paper writing, and OKpanda takes both speech recognition technology and human tutors to teach English conversation.

As a final example, allow me to introduce myself in Japanese: “八楽のマイクと申します (I am Mike from Yaraku; Ya-raku, of, Mike, is stating)” Firstly, in many other languages, the expression could be translated into “my name is…” or “this is … speaking” depending on usages, especially to those readers that see various expressions for the same meaning as aesthetics. In a controlled language such as Japanese, however, consistency is almost always above fluency, unless the pragmatics is about honorifics, such as using “と申します” rather than “です” as sentence-final participle. Secondly, “八” may look like something about “eight” at the first glance, but it is actually a name made of the Japanese for “8 million spirits,” so it has to be transliterated with a uncommon pronunciation “ya” and excluded from numeral treatment by translation memory. Lastly, “マイク” being my transliterated first name, is neither Japanese nor Mandarin (different from my “real” Chinese name), so it could be a cost-benefit issue for someone to decide whether to back-transliterate it to English or searching for my Chinese name.

Despite common big players in language business and technology, Asia is formed more from the bottom-up at the grassroots level with relatively small or non-profit organizations, or anyone who is willing to understand others who live in different places, eat different foods, and speak different tongues, especially in regard to those long misunderstood cultures such as the Muslim world or mysterious Far East. These “culture shocks” appear ranging from right-to-left scripts and writing systems without word separators to honorific connotations and spiritual interpretations. Considering situations discussed here, strategies in language business and technology may vary drastically, and represent challenges and opportunities to all of us.

Like what you read? Give In My Arrogant Opinion a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.