ENGLISH-TO-MALAYALAM MACHINE TRANSLATION USING PYTHON

Joel Jorly
Analytics Vidhya
Published in
5 min readDec 15, 2020

Today, the internet supports a wide array of languages. So, the concept of machine translation has indeed emerged as an important factor in connecting people who speak different languages. In this article, we are going to take a look at the process of translating English to Malayalam using Transfer Rules.

WHAT IS MACHINE TRANSLATION?

Machine translation can be defined as the process by which a software coverts text or speech in one language to another language. In other words, it is the study of designing systems that translates text from one natural language to another. Machine translation helps people from different places to understand an unknown language without the aid of a human translator.

WHY MACHINE TRANSLATION?

Machine Translation is considerably cheaper compared to human translators. They can sift through extremely high amounts of data within a very short span of time. Computer programs can translate enormous quantities of data consistently within a small time frame. If these were done manually, they would have taken weeks or even months to complete.

“Without translation, I would be limited to the borders of my own country. The translator is my most important ally. He introduces me to the world.”
– Italo Calvino

TRANSFER RULES IN MACHINE TRANSLATION

Transfer rules can be defined as a set linguistic rules which are defined as correspondences between the structure of the source language and that of the target language. Making use of transfer rules is one of the most common methods of machine translation.

MT using transfer rules can be divided into three steps :

  • Analysis of the source language text to determine its grammatical structure
  • Transfer of the resulting structure to a structure suitable for generating text in the target language
  • Generation of the output text

In this project, we make use of the Malayalam transfer rules. These are a set of rules which have to be followed in order to construct Malayalam sentences with good grammatical structures :

Image Source : Anitha T Nair, Sumam Mary Idicula, 978–1–4673–2149–5/12/31.00 IEEE 2012

All the “codes” which are mentioned in the above table represents the various parts of speech.

Image Source : https://pythonspot.com/nltk-speech-tagging/

Various transfer rules were used in this program in order to attain accurate results. NP (Noun Phrase) and VP (Verb Phrase) are considered as the parent tags.

These are some of the Transfer Rules that were implemented :

  • If the parent tag VP contains child tag VBZ NP, it is reordered as NP VBZ
  • If the parent tag NP contains child tags NP PP, it is reordered as PP NP
  • If the parent tag NP contains child tags NP VP, it is reordered as VP NP
  • If the parent tag VP contains child tags VBG NP, it is reordered as NP VBG
POS Tagging of the input text

PACKAGES IMPORTED

DATASET USED

The Olam English-Malayalam dataset has been used for this project. This is a growing, free and open, crowd sourced English-Malayalam dictionary with over 200,000 entries. The dataset consists of English words, their Malayalam definitions, and part / figure of speech tags.

Link to the dataset : https://olam.in/open/enml/

Olam Dataset

ALGORITHM

SAMPLE OUTPUT

Consider the input text “She is driving a car

Initially, the POS tagging of each word takes place, as shown below.

POS Tagging of input text
Reordering of words

After applying the transfer rules and translating the words, we get the output.

Output Text

In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language.

— Page 98, Deep Learning, 2016.

ADVANTAGES OF MT USING TRANSFER RULES

Machine Translation using Transfer rules has its advantages over other conventional translation methods. These include :

  • This method takes grammatical structure of the translated Malayalam sentence into account.
  • This method produces more meaningful outputs compared to Rule-Based MT (RBMT).
  • Using POS tags, we can identify the part of speech each word represents in the sentence.

DISADVANTAGES OF MT USING TRANSFER RULES

This method of Machine Translation also has its fair share of disadvantages. These include :

  • In order to improve the accuracy, we need to add a large number of rules.
  • In some cases, POS tags are assigned to the words without considering the context of the sentence. This can affect the accuracy of the output.
  • Writing the transfer rules require a lot of time. Moreover, good linguistic knowledge is necessary. One needs to be well versed with the language in order to deduce the transfer rules.
  • Inability to accurately translate sarcasm and idioms. In such cases, the literal meaning of the input is considered. The non-literal, expressive meaning of idioms such as “It’s a piece of cake” and “Let the cat out of the bag” will not be considered.

CONCLUSION

To conclude, Machine Translation is the task of automatically converting source text in one language to text in another language. In this case, we are implementing MT using Transfer Rules to convert English to Malayalam. This method can even be applied for other languages. Throughout the years, the accuracy of MT systems have been constantly improving. Now, we have AI translation models which are capable of producing highly accurate results at a very fast rate.

We can only wonder what the future of MT holds. Whatever it turns out to be, it will undoubtedly keep producing significant ripples in the language industry.

REFERENCES

  • Remya Rajan, Remya Sivan, Remya Ravindran, K.P Soman — Rule Based Machine Translation from English to Malayalam, International Conference on Advances in Computing, Control and Telecommunication Technologies, 2009
  • Marta R. Costa-Jussa, Mireia Farrus, Jose B. Marino˜, Jose A. R. Fonollosa , Study and Comparison of Rule-Based and Statistical Catalan-Spanish Machine Translation System, Computing and Informatics, Vol. 31, 2012
  • Anitha T Nair, Sumam Mary Idicula, 978–1–4673–2149–5/12/31.00 IEEE 2012
  • Bao Pham-Parts of Speech Tagging: Rule-Based, Harrisburg University of Science and Technology
  • https://en.wikipedia.org/wiki/Transfer-based_machine_translation

--

--