English to Hindi text translation using MarianMT models from HuggingFace

Purnima Chowrasia
Geek Culture
Published in
3 min readJul 20, 2021

In this short blog, I will cover up how to do text translation using the popular transformer library from Huggingface 🤗

Image Source: Google Image Search

If you are looking for some already available model, which is capable of translating given text in some given language X to some text in language Y. Then Language Technology Research Group at the University of Helsinki has brought to us 1300+ Machine translation(MT) models that are readily available on HuggingFace platform. Here is the link to the page, which contains all the MT models.

As mentioned earlier, here is a small tutorial on converting English text to Hindi text. And this tutorial can be easily extended to any language pair(given its corresponding model availability). When you visit their MT model’s repository page, you will see something like this as shown below.

Helsinki’s MT model hub

Finding the right MT model that fits your requirement

You will notice above that, Helsinki-NLP/opus-mt is common in all models. Just the rest of the part differs in all. That means all model names have the following format: Helsinki-NLP/opus-mt-{src}-{tgt}, where src and tgt placeholders contain the language codes. So, for English to Hindi text translation, the model name should be Helsinki-NLP/opus-mt-en-hi.

Brief about Helsinki-NLP/opus-mt-en-hi

This model is trained on OPUS dataset. This open parallel is the collection of translated texts from the web. It also includes translations of Wikipedia, WikiSource, WikiBooks, WikiNews and WikiQuote web pages. This GitHub page will provide the link to download the source and the target texts obtained from wiki web pages. And the required pre-processing step also includes tokenizing text using the SentencePiece library. I won’t go into any further details about their architecture and all. But, if you are interested to know more then here in this GitHub repo you can dig deeper and find all your answers.

Code plus Output

Here is the code snippet that is needed to convert the text from English to Hindi.

And the Hindi text generated from the above code seems to be translated well for some sentences. But for few sentences, translation is not up to the mark. Try and check for any English text and test it out.

Output from above code

We can also fine-tune this text translation model, improve its performance further and make it perform better on our own tasks. Notebook-friendly people can check out this colab notebook and get started. Enjoy Machine Translation…. !!!

--

--

Purnima Chowrasia
Geek Culture

Enhancing user experience | Data Accessibility Advocate | Let's code a better tomorrow! 🚀