Hey Microsoft, congrats to achieving human parity 😉

Jan Hinrichs
Beluga-team
Published in
4 min readMar 19, 2018

Microsoft claims to have hit a major milestone in the battle for human translation and we looked into it

First came Google about 180 days ago, claiming they’ve discovered the “New World” and that their techies achieved producing some nearly perfect translations with their neural machine translation rig. The bold statement fell cold after the system had its issues with translating rare words and actually surprising translators. In the end Google gets credit for improving its system up to 60% their phrase-based system on English↔French, English↔Spanish, and English↔Chinese. *slow clap*

Meanwhile, in techland a new player emerged making similarly bold claims, namely Microsoft with its Chinese to English translation. Well, to add insult to injury for Google, Microsoft went on to claim they’ve achieved human parity. Ouch, Google, that’s got to sting…

On March 14th they’ve published a research paper describing their process and finds. Microsoft defines human parity as such:

“Definition 1. If a bilingual human judges the quality of a candidate translation produced by a human to be equivalent to one produced by a machine, then the machine has achieved human parity.

Definition 2. If there is no statistically significant difference between human quality scores for a test set of candidate translations from a machine translation system and the scores for the corresponding human translations then the machine has achieved human parity.”

What followed was a media frenzy, with bolstering pieces on the future of translation and cultural impact, these news will have on the linguistics business. But let’s not jump on the hype train just yet!

via Giphy.com

Microsoft’s own human evaluation system shines a very bright light on their research and becomes pretty nebulous as the paper progresses. So, the testing method for the parity test has been set up in six different categories, from best to worst. Microsoft researchers grouped the first four categories together to show that they are in sync, in parity with each other.

Here is where the fun begins. Microsoft compared itself to Google’s NMT research and to other competitors, namely Sogou Inc’s Sogou Knowing NMT system. Microsoft’s publication shines a bad light on its own company by unintentionally stating that not only were they not even close, their competitor achieved the same thing, one year prior. Ouch again…

Meanwhile, Microsoft and Google came out as duds in the test by scoring lowest in their research.

Ok, we have to admit, there’s more to it. So here goes. After using a measuring system called BLEU (Bilingual Evaluation Understudy), Microsoft bested Sogou’s Knowing scores by many points. The Bleu testing score is composed as follows: It measures the difference between human and machine translation output and it requires two inputs: (i) a numerical translation closeness metric, which is then assigned and measured against (ii) a corpus of human reference translations. It calculates different input metrics using the n-gram method, a language model based on probabilities. Results are finally measured on a scale of 0 to 1, with 1 meaning “perfect translation”. Since even human translation isn’t perfect and would never score a 1, the system has to express the tendency of the translation outlet by multiplying it with a 100.

Even if still far from perfect, the system is constantly improving. Microsoft used a dual-learning method, which basically translates every sentence back and forth from Chinese to English. This allows the system to refine and learn from its mistakes. Dual learning, can be used to improve other AI-related tasks such as deliberation method — which is basically a jest on how people edit and revise their own writing by going through it again and again. By repeating the processes, the system is gradually improving response in the editing sector as well. Furthermore, they’ve created an agreement regularization which states that two algorithms read a sentence from left to right and right to left in Chinese. If both algorithms get the same results, the translation is considered to be correct.

Last words

Microsoft reached the parity milestone by letting linguists compare their translation to a fixed translation called the golden translation. This means that translation strings are compared to a human translator. Downside being as stated above, that there is no universally perfect translation which isn’t subjective. Yet another issue is that the algorithms entrusted with this task, do not have a full grasp of human translation, well not 100%. Yet.

Following the events of this year alone, machine translation won some massive battles in the quest for translation supremacy. And this all thanks to harnessing the sheer power of neural networks. The race is on, with the big players competing for developing the perfect tool for translation.

Even if human parity is still a figment of a dream, Microsoft proved it has gotten pretty close and hey, the rest is a really good PR stunt. *wink again*

*********

If you like this post we would really appreciate a 👏 or 👏 👏 or 👏 👏👏

Check out our Instagram for more.

About Beluga

Beluga helps fast-moving companies to translate their digital contents. With more than a decade of experience, professional linguists in all major markets and the latest translation technology at use, Beluga is a stable partner of many of the most thriving enterprises in the technology sector. The business goal: To help fast-growing companies offer their international audiences an excellent and engaging user experience.

--

--

Jan Hinrichs
Beluga-team
Editor for

Founder & CEO of Beluga Linguistics, Citizen, Activist, Papa...