This is Why Translation Software Can Never Replace Humans
A few times a year, machine learning will have a breakthrough in translation capabilities. Recently it has been Google’s learning algorithms that are increasing accuracy in translations. But even with technological breakthroughs, is translation software the better choice for your translation needs?
To answer your question, you first need to know how translation software and machine learning work.
Statistical Machine Translation (SMT)
SMT uses a large, existing data pool of human translations, commonly referred to as a “bilingual text corpora.” Statistical Machine Translation software need huge amounts of human-translated text in both the input language and whatever language you are trying to translate into.
There are several versions of Statistical Machine Translation; there is a Rules Based Translation process (RBMT) and a newer Phrase Based Translation process (PBMT).
Rules Based Translation
Rules Based Translation was one of the first machine learning techniques used to translate text. Following the rules of linguistics, this system learned to read and understand each word and, based on linguistics, had the ability to move words around based on whatever context the machine thought the word had.
However, looking at one word at a time — and never a whole sentence — how accurate could the replacement of a word based on “context” be?
Phrase Based Translation
Cue in Phrase Based Translation. Phrase based software creates “phrases” of text that are learned from the large corpora data sets in both languages. These phrases, however, are not linguistic phrases, just phrases of words that appear often, based on the large corpora data pools that the software taught itself from.
The goal of Phrase Based Translation is to translate whole phrases of words in order to reduce mistakes that were being made in Rules Based Translations (one word at a time).
If you are choosing a software that does either type of SMT, be cautious of the size and quality of the data pool. The larger the corpora size the better. With these data pools being collections of bilingual human translations, the quality of these translations must be accurate to ensure the best possible machine translation.
But even then, segmenting your text into non-linguistic phrases still leaves the chance that your translated text won’t be contextually correct.
Neural Machine Translation
Neural Machine Translation (NMT) is the translation software industry’s newest answer to the mistakes found in both types of Statistical Machine Translation. Instead of word-for-word translation or phrase translation, NMT uses full sentence translation.
Google started using this method in 2016 and it uses its own Deep Learning technology to better translate your text.
Deep Learning is a set of algorithms used to decide what you might watch, buy or search for next. It helps predict what you type. It’s the reason why when you start to search for “how to” in Google you get a dropdown list of other popularly searched “how to’s.”
Deep Learning algorithms feed translation software huge amounts of data that it sifts through. What the data translation software is fed is what the software learns from, and based on all the data it gets fed, it starts to make decisions based on what it “knows.”
When the software thinks it has learned something new, it will go back and apply that to all it had previously learned.
Unlike Statistical Translation, there is not a finite set of data to learn from. There is also no set numbers of patterns and rules to follow based on human input, so “learning” is never over.
Deep Learning allows translation software to ask itself true/false questions, catalog the answers and eventually it will start to form a functioning system. When it sees a word or phrase or entire sentence, it puts the text through a series of questions and gives a more accurate output.
Jay Marciano, a leader in machine translation and Director of Machine Translation at Lionbridge, says that deep learning and Neural Translation can “identify complicated patterns and associations among these patterns, in ways that are beyond human ability to recognize.”
Neural Machine Translations are certainly the future of machine translations. But is it better than a human? Even though it may be able to identify patterns humans cannot see, it still cannot fully understand the nuance and meaning of the written word.
A Translation Test
Back in January of 2017, Sejong Cyber University in South Korea and the International Interpretation and Translation Association of Korea hosted the ultimate translation battle with three machines.
The three machines were Google Translate (a Neural Machine translator), Systran Translation Program (a Phrase Based translator) and an app, Papago (a Phrase Based translator).
The texts given to each machine were four different pieces of writing. A Fox News article, a Korean language opinion piece from a local paper and two excerpts from a book, one English based and one Korean based.
There were three scoring criteria: accuracy, language expression, and logic and organization. Five points max per category for a total of 15 points that were then totaled together for a total score of 60 possible points.
How did the machines do?
- Last place: Systran 15/60
- Second place: Papago 17/60
- First place: Google Translate 28/60
Surprised by the results? While machine translation services have come a long way with deep learning technologies, they are still a long way off from replacing a human translator, who scored a 49 on the same translations.
“No matter how fast the translation programs are, many [people] will doubt they can perfectly translate subtle expressions of emotion in literature.” — International Interpretation and Translation Association Chairman Kim Dong-ik.
Neural translation backed by deep learning is certainly the future, but it still has a long way to go. Even something as small as a syllable can trip up Google Translate, as one chef found out the hard way.
During the first official day of the 2018 Winter Olympics, Norway’s chef Stale Johansen needed 1,500 more eggs and drafted his order in Korean with the help of Google Translate. He was surprised to find 15,000 eggs delivered the next day.
Only one syllable separates 1,500 and 15,000, a very small nuance missed by what is supposed to be one of the smartest translation softwares out there.
Trusting Translation Software
It is tempting to put your content into a computer program and have it spit out a translated copy in minutes. But to get a reliable translation you would want to run it through a Phrase Machine Translator and a Neural Machine Translation, and even then, you still might not have a full and accurate translation.
Originally published at ivannovation.com on July 17, 2018.