What is M2M-100?
The First Multilingual Machine Translation Model
Have you ever thought about using Facebook in your native language or wondered how it is done? Recently Facebook has developed a natural language processing model that translates 200 languages without translating the data into English first. This model is referred to as the M2M (Many to Many) translating model. Most of the prior work of translation was based on English-Centric data sets and models which translated from and to English for translation from one language to another but not between non-English languages. This English-Centric bias in the data and resulting models are not reflective of how people use translation and empirically leads to a lower performance for non-English translation directions. In contrast, the newly introduces M2M model translates a language to any language without relying on English as an intermediary.
How the M2M Model Works:
Before the M2M model, the process of translation from one language to another language was dependent on the English language. It was a bridge between any two languages. However, Facebook has recently developed a model that translates data from one language to another without using English as an intermediary. Let us compare it with the English learning process of an individual who knows only Urdu and any other native language. In most cases that individual will first translate English to Urdu and then to the local language. The same was the case previously with the translation in Facebook and all other social networks that Languages were to be translated first into English and then into another language. This will affect the accuracy and efficiency of the translation model. To overcome this problem, Facebook has introduced the M2M-100 translation model. The first multilingual machine translation model. That can translate between any pair of 100 languages without any reliance on the English language.
The model does not depend on English as a link between two languages. For example, for translation between Chinese and Urdu, typically systems train on Chinese to English and then English to Urdu; however, the M2M-100 model can now directly translate from Chinese to Urdu. Thus, preserving the original meaning more efficiently. This model learns a specific language and does not require any intermediate medium(language) to translate it from one language to another.
What Technology M2M-100 Uses
Any language that is to be translated is first broken down into small words i.e., tokens. For translation, the system receives and gives tokens as input and output respectively. However, basing translation on words as units could be challenging in the Multilanguage translation process as it either leads to poor coverage vocabularies or extra-large vocabularies bag. Another problem is that in many languages there are words that could not be broken down into single tokens. Thus, the Multilanguage translation model uses a subword tokenization process to produce a sequence of tokens as input and output to the machine. Furthermore, this multi-language translation model is based on Transformer sequence-to-sequence architecture, which consists of two modules: the encoder takes the sequence of tokens of the source language and transforms it into a sequence of subword tokens of the same length that are embedded in the decoder. The decoder embedded with the sequence of tokens decodes it by transforming it to the sequence of tokens in the target language and thus then produces the target language. However, in both encoder and decoder, there is a special token that indicates source language to the encoder and target language to the encoder.
Wrapping Up
The best thing about the model is that it is open source and the researchers can take benefit from it, as the model outperforms English-Centric multilingual models trained on data where either the source or target language is English. The system improves over 10 BLEUs on average compared to an English-Centric baseline when translating directly between non-English directions. Thus M2M-100 is a competitive model developed for bilingual models as it improves the efficiency of translation between non-English languages without reliance on English.