The past, present, and future of machine translation

Is it really neural machine translation vs. large language models? Can we have the best of both worlds?

Flitto
Flitto DataLab
7 min readJun 9, 2023

--

Image source: Unsplash

When neural machine translation (NMT) was first introduced to the market in 2016, it was truly a technological breakthrough. Until now, it stands as one of the most advanced forms of translation technology.

Neural machine translation vs. Statistical machine translation

Like its name suggests, what distinguishes neural machine translation is its ability to mimic the human brain mechanism. It is able to understand how some words are more similar to each other than others, and take the entire sentence into consideration when generating translations.

Until the appearance of NMT, statistical machine translation (SMT) was the previously dominant model of machine translation. Unlike NMT models, SMT could only digest few words at a time when translating sentences, resulting in literal or awkward translations. Moreover, the systems that compose SMT models lacked interdependence, limiting their flexibility as a whole.

Neural machine translation vs. “Generative AI”?

Few months ago, the capabilities of ChatGPT in processing human language, including translating, made an explosive viral debut and took the scene by storm.

With multifunctional chatbots like ChatGPT and Bard appearing in the market, “generative AI” has gradually continued to become more of a household name. This has caused a possible misconception (or a misnomer) that generative AI was the great new technology to threaten, or even replace, neural machine translation models.

However, it is important to note that neural machine translation is, after all, a form of generative AI by definition. It works with deep learning mechanisms and functions on massive datasets to calculate outputs by probability and appropriateness. This is the same mechanism used by chatbot-based generative AI models, as they generate “replies” instead of translation. When given a prompt (i.e. texts in source language), the NMT system leverages existing training datasets in order to predict and generate new outputs (i.e. texts in target language).

For clarification’s sake, it may be more accurate to analyze and discuss conventional neural machine translation models with large language model-based machine translation.

Similarities between NMT and LLM-based translations

Machine translation systems, regardless of their forms and architectures, leverage a specialized type of data called the parallel corpora. The parallel corpora pertains to strings of language data aligned with their corresponding translations in the target language. These parallel corpora are the building blocks to machine translation systems, as they provide examples as to how each words are related to their multilingual counterparts, and enable the systems to utilize that knowledge in translation.

Example of how parallel corpora data may look like

Both the neural machine translation system and the large language model utilize this type of data when carrying out translation tasks. This means that their functionalities can be both empowered and inhibited with similar data-level problems:

  • More diverse, accurate, and rich parallel corpora datasets mean better translation capabilities.
  • There is a disparity in output qualities between high-resource and low-resource languages due to an inherent challenge in securing trainable parallel corpora.
  • Depending on domains or tasks, their datasets may be fine-tuned to deliver more specific translation results.

Parallel corpora datasets can be made through various sources. It is common to devise them by utilizing multilingual documents or websites. However, these types of sources can be subject to copyright or ethical issues, so it is important to proceed with caution. For customized domains or languages, it is also possible to custom-collect and refine parallel corpora according to what is necessary for the engine.

Next, let us look at the differences between the two translator tools.

Neural machine translation vs. large language model-based translation

We have established that standalone neural machine translation systems and large language model-based translation utilize similar mechanisms as well as overlapping types of data during translation processes. If so, why did people come to compare neural machine translation against large language models? Is one better than the other — and if so, which system should we use for better results?

The two offer distinct characteristics and strengths. Neural machine translation (NMT) systems are:

  • Solely designed with the purpose of carrying translation tasks;
  • Dominantly trained in a supervised manner, meaning that the training datasets have been appropriately labeled (e.g. parallel corpora); and
  • Often trained to prioritize the avoidance of producing wrong outputs (accuracy > creativity).

Meanwhile, large language models are:

  • Designed to carry out diverse tasks related to languages, not limited to translation;
  • Not limited to supervised learning methods when training, meaning that they can be trained with vast unlabeled data, and labeled data comprise only a portion of the data they are trained with; and thus
  • Able to leverage non-parallel corpora data when producing translation results (which means there are more unexpected variables).

These differences offer options to the users. For instance, users may opt to patronize NMT if they require straightforward translation that require higher accuracy. On the other hand, large language models can be useful when users are given some room for creativity when it comes to translation outputs.

The translation quality also greatly vary among different engines built by different companies depending on the dataset size, quality, and model structure, which is another factor to consider.

Moving forward — Synthesis of NMT and LLM

In addition to accuracy and fluency, relevance and timeliness play crucial roles in achieving good machine translation. While accuracy ensures the translation is correct and fluency ensures it reads smoothly, relevance ensures that the translation aligns with the intended meaning and context of the source text, while timeliness ensures that translations are delivered in a timely manner.

To address the weaknesses and enhance the relevance factor in machine translation, a multi-faceted approach is ideal. This approach combines the strengths of different translation technologies, such as neural machine translation (NMT) and large language models (LLMs), to create a more comprehensive translation tool.

By leveraging the capabilities of both NMT and LLMs, machine translation systems can benefit from the accuracy and context-awareness of NMT and the diverse linguistic knowledge and creativity of LLMs. Moreover, regular updates in training datasets will help ensure that the translation models stay current and reflect the evolving language usage and nuances.

Real life example of synthesized machine translation

Flitto’s new cutting-edge translation service Flitto AI+ is an example of a synthesis between the two technologies. The unique translation service leverages both standalone NMT and LLM and takes the best of both worlds.

Flitto’s proprietary NMT engine, regularly updated with high-quality parallel corpora datasets collected and vetted in Flitto’s language platform with 13M users worldwide, serves as an effective fine-tuning mechanism to the advanced human-like fluency enabled by a large language model.

A snapshot of translation by Flitto AI+

The resulting platform showed powerful performances. Flitto AI+, currently in its beta version, scored up to 41 percent higher than existing standalone NMT and LLM systems, according to a blind survey conducted among linguists and language professionals. The result remained even across diverse source text domains, like marketing, poetry, and technical texts. It especially excelled in translating idioms and play on words.

Moving on…

Both neural machine translation and large language models have greatly transformed the field of machine translation. While their performances both rely on parallel corpora datasets, they offer distinct advantages depending on the translation requirements. The synthesis of these technologies can lead to enhanced translation performance, offering accuracy, fluency, relevance, and creativity in machine translation outputs.

While machine translation cannot replace human translators and thorough localization processes, it can be a great aid for everyday instances where a simple, accurate translation can save the day.

Flitto DataLab offers solutions on the overall data-as-a-service lifecycle of machine translation engines through high-quality off-the-shelf parallel and monolingual corpora and even a customized collection of datasets. As a leading language data solution company, we are looking forward to taking a part in advancing more cutting-edge technologies to come.

  1. Grab a copy of that one foreign article, recipe, or interview that could really scratch the itch in your mind
  2. Click here to translate it using Flitto AI+ Beta (no payments required)
  3. Enjoy!

To find out more about the parallel corpora data solutions that made Flitto AI+ possible, click here.

--

--

Flitto
Flitto DataLab

A leader in real-time AI interpretation and AI/ML data solutions