We’ve told you before: (Re-)discovering Translationese in Machine Translation Research

Even though translation scholars have been arguing for decades about the specific properties of translations, the MT community has been turning a deaf ear. Until recently.

Published in

Machine Translation @ FBK

6 min readOct 1, 2019

During the past year, the term translationese made a strong comeback in papers on machine translation (MT), mostly in relation to MT evaluation. Researchers in MT seem to be discovering that disregarding the specific properties of translations compared to original texts can affect the performance of MT systems and can even lead to inflated evaluation scores, questioning the validity of reported results in more than a decade of MT research.

Can this signify a new era in MT deployment and evaluation when translation studies find their well-deserved place inside MT research?

Number of papers mentioning the term “translationese” published in the ACL Anthology (references excluded) per year of publication.

Translationese and translation studies

The term translationese was first coined by Gellerstam back in 1986 to denote specific ‘fingerprints’ that the process of translation leaves on the product (the translated text). These fingerprints can be introduced either by interference from the source language or by the translation process itself. Interference is related to the work by Toury (1980), who mentioned that translators translate based on a set of laws, the translation norms, which are developed through routine translation. The law of interference from the source language produces certain marks on the translation product which are language-specific. On the other hand, the marks introduced by the translation process could be present in any translated text and this is why they are also referred to as translation universals (Baker, 1993), even though scholars have at times argued against their presence, showing that there is variation inside translation itself. The main translationese features can be described as:

Interference: influence by the language system of the first language (or other languages), which can be manifested in lexical choice (loan words), and grammatical and syntactical structures of the first language.
Simplification: the choice of simpler structures and linguistic features, such as more common and shorter words, shorter sentences and avoidance of subordinate clauses.
Explicitation: the tendency to add information in order to explain or make more explicit what is mentioned in the text, such as explicit naming instead of pronouns, multiple naming (the German Chancellor Angela Merkel instead of simply Merkel) and cohesive markers.
Normalization: standardisation, over-grammaticality, the tendency to avoid repetitions and contractions, use of more formal language.

There is often the false impression that translationese is related to ‘poor’ translation quality. This impression is mirrored implicitly in several recent papers:

“The term translationese has been used to describe the presence of unusual features in translated text.” (Graham et al., 2019)
[…] making the actual (human) translation a less natural example of the target language. (Freitag et al., 2019)
[…] rephrase the NMT output in a more natural way, aiming to remove undesirable translation artifacts that have been introduced. (Freitag et al., 2019)
[…] generating a rough gloss by means of a dictionary and then ‘translating’ the resulting pseudo-translation, or ‘Translationese’ into a fully fluent translation. (Pourdamghani et al., 2019)

This is not the case. Translations are a unique text type and translationese features are an integral part of any translation. Translations are not inferior to originals, they are simply different. An analogy can be drawn to other types of texts. For example, a spoken text is characterised by contracted forms (isn’t instead of is not), speech particles and other elements not found in written texts. This doesn’t mean that the spoken text is of ‘poor’ quality. A translation is identified by the properties of a translated text and it serves a communicative purpose equally important to the purpose it serves in the source language.

Translationese and Machine Translation

Unlike in translation studies, where translationese refers to particular characteristics of translated texts, translationese in MT has ended up being used to refer to translated data, especially when they are used on the source side during model training. When creating corpora for MT, researchers mix all bilingual data regardless of the translation direction (both originals_on_the_source→translations_on_the_target and translations_on_the_source→originals_on_the_target). The effect of translation direction in the corpora for MT research was first reported in relation to training data during the era of Statistical Machine Translation (SMT), where SMT systems trained with data in the “real” direction performed slightly better than systems trained on data with mixed directions (Kurokawa et al., 2009; Lembersky, 2013). A later work (Stymne, 2017) found similar effects for translation direction on tuning sets.

With the neural hype, the topic of splitting training data based on translation direction was completely sidelined. Neural MT systems are data-hungry so the issue of insufficient training data became stronger than ever. After the bold claims of attaining human parity (Hassan et al., 2018) though, researchers started questioning the current practices of evaluating MT systems, especially in regard to the mixed direction test sets. Several works showed that human assessment scores and even the rankings of the best performing systems at evaluation campaigns organised at the workshops and conferences on MT (WMT) could change depending on the translation direction in the test data (Toral et al., 2018; Zhang and Toral, 2019; Graham et al., 2019). Their hypothesis is that translationese is easier to translate by MT systems due to simplification, even though this hypothesis was questioned in a recent paper (Mielke et al., 2019) showing that although different, “translationese” is not easier to model than natively written language.

Machine Translation and Corpora

If translation direction matters so much, then why do researchers mix the translation directions when creating corpora for training MT systems? At the first days of MT research, parallel data was not easy to obtain. First attempts to create large corpora for training MT systems (e.g. EuroParl) needed all data available, regardless of their translation direction. This became an established practice in the MT community. But translation direction is not the only important element that was ignored. In the process of creating corpora for MT systems important information is often lost. For example, shuffling sentences or randomly creating train/dev/test sets in a x/1/1 ratio breaks document coherence. These steps are to a large extent non-reversible. Researchers compiling corpora should be encouraged to make their pre-processing steps reversible as much as possible and conserve information even if there is no immediate need for it.

Lessons learned

All this shows the importance of interdisciplinarity and communication between the field of translation studies and machine translation. Although MT emerged from a mathematical, formal ground while translation studies have a linguistic, prescriptive outset, both fields deal with the two sides of the same coin. Being aware of the literature in both fields and finding a common language is essential for translation studies-informed MT research. This, in turn, would be an opportunity for translation studies to develop more computational and quantitative methods complementary to descriptive ones, also inside the context of MT.

Traditions and established practices need to be questioned and adapted to new situations. As evaluating MT systems on mixed sets starts to be abandoned, also evaluating sentences without their co-text might slowly be replaced by document-level evaluations. Still, no metric or method exists to evaluate translations with regard to text function (documentary vs. instrumental) or their purpose (Skopos), nor in their general or social context. And still, MT research on the creative aspects of translation (transcreation, literary translation, dubbing) is trammelled by lack of an automatic evaluation method, since it has to rely on human judgements as the only indication of quality translation.

All these constitute significant challenges that cannot be solved overnight, but let’s see if translation studies have an answer…

Inspired by these issues, we have created Europarl-UdS, a corpus specifically tailored for translationese research. It provides parallel and comparable corpora filtered not only by translation direction, but also making sure that the source side contain sentences by native speakers, so the translationese signal can proudly shine out. Compiled with this amazing pipeline by José Manuel Martínez Martínez. New languages added per request.