Neural Machine Translation using EasyNMT Library

Kalyan Ks
6 min readJul 30, 2022

--

In this blogpost, you are going to see

  • brief overview of EasyNMT library
  • advantages of using EasyNMT library for MT over Hugging Face Pipeline
  • sentence translation and documentation translation using EasyNMT library

πŸ’₯ Overview of EasyNMT Library

Image by Author

EasyNMT is an easy to use python library for SOTA Machine Translation. This library is developed by NLP researchers from UKP Lab, TU Darmstadt. Some of the highlights of this library are

  • can do SOTA machine translation in just three lines of code.
  • supports automatic language detection for 170+ languages. So it is optional to specify language code for source sequence during machine translation
  • supports document translation
  • supports SOTA machine translation models like OPUS-MT, fine-tuned versions of mBART50 and M2M100.

EasyNMT library supports SOTA machine translation models which are Pytorch-based. So make sure that PyTorch is installed to use this library.

πŸ’₯ EasyNMT Library Demo

You can also access EasyNMT library through online demo. Here is the screenshot of EasyNMT online demo. The online Demo is based on OPUS-MT model only as OPUS-MT models are lightweight compared to other SOTA MT models like mBART50 and M2M100.

EasyNMT Demo

EasyNMT library online demo can be accessed at this link

πŸ’₯ Supported SOTA MT Models

EasyNMT library supports SOTA MT models like OPUS-MT, mBART-50 and M2M100. Here is the brief overview of these models

Image by Author

From the above the table, we can observe the following things

  • OPUS-MT is much lighter in size compared to other models
  • OPUS-MT models support a maximum sequence length of 512 while all other models support a maximum sequence length of 1024.
  • In OPUS-MT models, encoder and decoder consists of 6 layers each while in all other models, encoder and decoder consists of 12 layers each.

Except OPUS-MT, all other models support maximum sequence length of 1024 tokens. If the input text sequence is more than maximum sequence length, then the input sequence is split into sentences and then each sentence is translated (this is referred to as document translation).

πŸ’₯ EasyNMT vs Hugging Face MT Pipeline

In the previous blogpost, we have the seen advantages of using Huggingface MT pipeline over naive model inference i.e., using pipeline avoids writing additional lines of code for various steps in model inference. Shortly, we can say Hugging Face MT pipeline simplifies the process of MT. EasyNMT further simplifies the process of MT. Here are the advantages of EasyNMT over HF pipeline

EasyNMT vs HF Pipeline (Image by Author)

Here are the advantages of EasyNMT library over HF pipeline for machine translation

  • EasyNMT library supports auto lang detection due to which we need not to specify the source language.
  • There is no need to explicitly load tokenizer and fine-tuned models. EasyNMT library automatically loads the tokenizer and model for the given model_name.
  • EasyNMT supports document translation. When the length of source sequence is more than the maximum sequence length of the model, the source sequence is split into sentences and then each sentence is translated.
  • By default HF pipeline uses CPU device. If you want to use GPU, we need to explicitly set the device. EasyNMT automatically uses GPU device if available.

πŸ’₯ Template for Machine Translation using EasyNMT Library

Now we will see the basic template for using EasyNMT library for NMT inference.

  • Step 1: Create model object by passing the model name to EasyNMT class.

Based on the model name given, the model is downloaded and then loaded using AutoModelForSeq2SeqLM class.

  • Step 2: Translate the source text sequence by invoking translate() using model object.

We have to pass the source text sequence and the target language id to the translate(). As EasyNMT library can do auto language detection, we need not to specify the source language id.

In this way, we can do SOTA machine translation using EasyNMT library in just few lines of code. Using EasyNMT library to do SOTA machine translation is much easier compared to Hugging Face Pipeline.

πŸ’₯ Machine Translation using EasyNMT Library

Now we will see how to use EasyNMT library to translate text sequences using SOTA models like OPUS-MT, mBAR50 and M2M100.

As EasyNMT libraries supports automatic language detection, we need not to specify source language during machine translation. EasyNMT libraries automatically identifies the source language.

EasyNMT library automatically sets the device i.e, if GPU is available, the models are run on GPU. Otherwise, the models are run on CPU. So, we need not to set the device explicitly.

πŸ”₯ Install and import libraries

First install the libraries like EasyNMT and Sacremoses

Import the required classes from EasyNMT library

Consider the following sentence in Hindi. We see how will the SOTA MT models can translate this sentence in Hindi to English.

πŸ”₯ OPUS-MT

πŸ”₯ mBART50_m2m

Here we want to translate a sentence in Hindi to English. mBART50-MO model can translate sentence from any of the supported languages to English. So we can use the model mbart50-m2m to translate the sentence in Hindi to English.

πŸ”₯ mBART50_m2en

Here we want to translate a sentence in Hindi to English. mBART50-MO model can translate sentence from any of the supported languages to any of the supported languages. So we can use the model mbart50-m2en to translate the sentence in Hindi to English.

πŸ”₯ M2M100_418M

Here we want to translate a sentence in Hindi to English. M2M100 model can translate sentence from any of the supported languages to any of the supported languages. So we can use the model m2m_100_418M to translate the sentence in Hindi to English.

πŸ”₯ M2M100_1.2B

Here we want to translate a sentence in Hindi to English. M2M100 model can translate sentence from any of the supported languages to any of the supported languages. So we can use the model m2m_100_1.2B to translate the sentence in Hindi to English.

πŸ’₯ Document Translation using EasyNMT Library

In case of HF pipeline, when the source sequence length is more than the maximum sequence length, the source sequence is truncated to maximum sequence length and then the source sequence is translated. However, in case of EasyNMT library, when the source sequence is more than the maximum sequence length, the source sequence is split into sentences and then each sentence is translated. This is referred to as document translation.

Consider the following text sequence in English. Now we will use EasyNMT library to translate this entire text sequence to Hindi.

We can observe that entire sequence in English is translated to Hindi without any truncation.

This blogpost is originally published in my personal website. Feel free to connect with me through Twitter or LinkedIn.

I’m Katikapalli Subramanyam Kalyan (shortly Kalyan KS), NLP researcher with 5+ years of academic research experience. Apart from research papers in top tier journals in medical informatics and EMNLP, AACL-IJCNLP workshops, I have written two survey papers on transformers-based pretrained language models which received 35+ citations including the citations from top tier institutes like University of Oxford, University of Texas, Michigan State University, NTU Singapore, IIT Madras etc.

--

--

Kalyan Ks

NLP Researcher with publications in top-tier journals and top conference (EMNLP, AACL-IJCNLP) workshops. EMNLP 2022 Reviewer. Website : https://mr-nlp.github.io