What is machine translation?

Chier Hu
Mr. Translator
Published in
6 min readApr 4, 2020

Machine translation is not only a branch of computational linguistics, but also an important application in the field of artificial intelligence. The earliest related research can be traced back to the 1950s.
With the rapid development of the Internet, the demand for language translation is increasing day by day.
At present, there are hundreds of different languages on the Internet, of which English accounts for about half of all Internet content, while native English-speaking Internet users account for only 1/4 of all Internet users.
Cross-domain language barrier, access to more content on the Internet is a growing demand.
Machine translation, that is, translating the text of one language into another language through computer, has become one of the important methods to solve the language barrier.
As early as 2013, Google Translation provided translation services as many as 1 billion times a day, equivalent to a year of human translation worldwide. It handles the equivalent of 1 million books!

The Development of Machine Translation Technology.

The research of machine translation has experienced three stages: rule-based method, statistics-based method and neural network-based method.
In the early days of machine translation research, rule-based methods were mainly used.
The machine translation system translates according to the translation rules written by language experts, which is a mechanical process.
The rule-based method is limited by the quality and quantity of rules written manually, the writing of rules is very time-consuming and laborious, and translation rules can not be used between different language pairs.
At the same time, with the increase in the number of rules, the number of conflicting rules also increases, so it is difficult to cover the whole situation of human language, which is also the bottleneck of machine translation system.
In the 1990s, statistics-based machine translation method was proposed, and then quickly became the mainstream method of machine translation research.
Statistical machine translation uses a bilingual parallel corpus (that is, a corpus containing both the source language and the target language texts translated into each other, as training data.
The Rosetta tablet, which is well known to the world, can be regarded as an ancient “parallel corpus”. The same content is recorded in Egyptian hieroglyphs, Demotic and ancient Greek.
It was the discovery of the Rosetta tablet that gave linguists the key to deciphering Egyptian hieroglyphs.

The statistical machine translation model excavates the alignment relationship between words in different languages from the parallel corpus, and automatically extracts translation rules based on the alignment relationship.
A classical statistical machine translation model usually consists of three parts: translation model, ordering model and language model.
The translation model is responsible for estimating the probability of mutual translation between words and phrases, the ordering model models the ordering of translated language fragments, and the language model is used to calculate whether the resulting translation conforms to the expression habits of the target language.
The statistical translation model reduces human participation, and the model itself and the training process are language independent, which greatly improves the performance and scope of application of machine translation.
In recent years, with the introduction of neural network-based methods into the field of machine translation, the performance of machine translation has been greatly improved.
According to information released by the Google machine translation team, Google Translation launched the Chinese-English neural network model in September 2016. As of May 2017, 41 pairs of bilingual translation modules have been supported, and more than 50% of the translation traffic has been provided by the neural network model.
The neural network model also needs to use the parallel corpus as the training data, but unlike the statistical machine translation to disassemble the model into multiple parts, the neural network model is usually a whole sequence-to-sequence model.
Taking the common Recurrent neural network as an example, the neural network model first needs to transform the words of the source language and the target language into vector expressions, and then use the Recurrent neural network to model the translation process.

Usually, a cyclic neural network is used as the encoder to encode the input sequence (the word sequence of the source language sentence) into a vector representation, and then a cyclic neural network is used as the decoder. the output sequence (the word sequence of the target language sentence) is decoded from the vector representation obtained by the encoder.

Neural network model has become the focus of research and application in the field of machine translation in recent years. There are many new improvements to neural network translation model, such as LSTM, attention mechanism, training goal improvement, non-parallel corpus training and so on. The performance of machine translation system will close to the human level step by step.

Application of Machine Translation.

At present, the effect of machine translation is difficult to reach the level of human translation, but with the improvement of machine translation performance, its application scenarios are becoming more and more diverse.
Google Translation (Google Translate), which was launched by Google in 2006, has gone through more than ten years. At present, it supports hundreds of different languages and provides a variety of access methods, such as web pages, mobile clients, program API and so on.
Data from May 2017 show that Google Translation provides translation services for 500 million people a day.
Domestic and foreign companies such as Microsoft, Baidu, Sogou and NetEase are also constantly optimizing their machine translation services for public use.
Although various types of machine translation services can not be directly used in written translation for the time being, the barriers for people to understand other languages have been greatly reduced, and machine translation has played a good auxiliary role in many scenarios.

When traveling abroad, the lack of language to communicate is a major pain point for many people.
The photo translation of all kinds of mobile phone App enables people to easily and quickly understand road signs or menus in a foreign country

Baidu, NetEase and other companies have used the results of machine translation in the field of tourism and launched special portable translators. As long as they speak Chinese into the translator, they can automatically help users translate into other languages, which can be described as a good tool of traveling abroad.

Baidu WiFi Translator and Youdao Translator

With the improvement of machine translation performance, the goals of major companies have gradually been put into the field of simultaneous interpretation.
At the 2016 Wuzhen Internet Conference, Sogou CEO used real-time machine translation technology in his speech, which can convert speech speech into text in real time and translate it into English synchronously. The Boao Forum in 2018 introduced the machine translation simultaneous interpretation technology provided by Tencent, but the actual results are not entirely satisfactory.
It can be seen that although the current machine translation model has made great progress, there is still a long way to go to replace human beings and play an important role in the field of simultaneous interpretation.
The field of machine translation has attracted more and more attention, but it is also facing great challenges.
How to overcome the existing shortcomings (such as the poor interpretability of the neural network model) and how to further improve the translation performance is still a problem to be solved.
At present, the application of machine translation is still in the simple understanding of other languages, auxiliary translation and so on, and there is still a big gap to replace manual translation on a large scale.
However, with the widespread concern of the industry and the continuous influx of talents, the field of machine translation will continue to flourish, and the Tower of Babel in the human world will eventually be rebuilt.

--

--