The State of Machine Translation 2020, new COVID report, Intento for XTM users, 10 ways to optimize text for machine translation, and more

Natalia Rice
Intento
7 min readSep 24, 2020

--

Hi, network!

Hope you’ve adapted to the COVID-affected work environment, and your business is going well. It’s been great to witness how creative the localization industry has been in moving its events online. From a very successful LocWorld event in July and popular TAUS webinars to upcoming AMTA and GlobalSaké events in October, we are thrilled to be part of the community.

📕 The State of Machine Translation 2020

🦠 COVID special: machine translation performance in the COVID domain

🖥 Intento for XTM users

🤓 10 ways to optimize text for machine translation

🌇 Events in October not to be missed

🧰 Essential industry news and reads

The State of Machine Translation 2020

Building on the reports published in 2018 and 2019, the 2020 report takes one step further. We have partnered with TAUS, who provided domain-specific data in 16 different industry sectors. On top of that, we’ve analyzed 15 commercial machine translation engines and 14 language pairs, which altogether made 128 combinations.

All these efforts have been made to answer the following questions:

  • Does the performance of an MT engine depend on a domain or content type?
  • What industries show the highest MT quality across the board?
  • What MT is right for me, and how many do I need?

Get the free copy of your report here.

COVID special: MT evaluation

We had hoped that by the time we published this research, the pandemic would have been a thing of the past, but sadly it’s still here. We’ve run this study to evaluate the performance of machine translation engines — both stock and custom NMT — for COVID-related content.

This evaluation has been implemented with two goals in mind:

  • Evaluate what MT engines work best for COVID-related content in different language pairs.
  • Share our approach to custom NMT training, identifying steps to be taken, and potential pitfalls that may occur. Custom NMT training is costly, and solid ROI is essential for everyone undertaking it. With this in mind, we present our methodology, challenges, and factors that influenced the outcomes of this study.

The evaluation has been done for seven language pairs. Find the full information here.

Intento brings its magic to XTM users

We’re happy to introduce a new addition to the family of Intento TMS connectors — XTM connector. With many happy customers of memoQ and Trados Studio plugins, it was only natural to bring this new tool to language professionals.

The new plugin is very capable. It can be used in a simple or advanced mode.

In the simple workflow scenario, a project manager using XTM can trigger Intento by simply selecting a particular custom field in the settings. The translation will be applied at the translation workflow stage.

Advanced users however can create intricate and nuanced workflows such as “Apply Intento-routed MT to all segments in Customer X’s projects with TM matches below 75%, show MT results + TM matches between 75% and 90% and then move the document to the next workflow stage”.

Ten ways to optimize text for machine translation

The way you prepare your source text for machine translation will define the result you’ll get. We’ve been evaluating the performance of machine translation engines for years. This article summarizes our experience (sometimes funny, other times — curious or painful) with one mission: to help you prepare your source text in a manner that will yield the best machine translation results.

Read the article.

More Intento product updates

  1. Intento Translator for Excel and Intento Translator for Word have become one! Two popular add-ins have merged their powers for you to enjoy so that you can increase your productivity with more seamless translation experience.
  • Stay in Word or Excel, no need to open a new window
  • Instead of just one machine translation engine, access them all at once via smart routing
  • Be sure about the quality of the translation — it is at the level or higher of any known MT engine
  • All your data is safe and handled securely
  • Tailor machine translation output to your needs with glossaries, tone of voice, and more — just contact us for a bespoke solution

2. Intento’s Chrome Extension has been updated.

Here’s what’s new:

• You can save some precious moments by using keyboard shortcuts and get translations without opening the Intento Translator extension each time. Here’s your quick guide to the shortcuts.
• Not only can you read content written in other languages, but you can also write it! We have added two-way translation, enabling it in text areas. It gives you a superpower — if you’re chatting with your colleague or a customer from another country, just type your message, push the button, and boom -it’s instantly translated and sent to them in their language.

3. Same applies to Intento Translator for Outlook. We have added translation in a compose mode — just write an email in your native language, push the button, and it will be sent to the recipient in their language.

4. SDL introduced SDL Trados Studio 2021 just recently in August, and we’re happy to confirm Intento has already released a brand new plugin for that. Get yours and your work won’t be interrupted.

Events in October to reconnect virtually

AMTA 2020 (October 8)

Join us at AMTA! We’ll be virtually exhibiting during the event — stop by to say hi! Our CEO Konstantin Savenkov will be talking about building a multi-purpose MT portfolio on October 8 at 2.30 pm PT.

GlobalSake (October 22)

Another reason to connect is an upcoming event hosted by GlobalSake, famous for its networking: The Future of Function. It’s free if you book in the next 24 hours!

Intento will be talking about AI biases in future-proof machine translation.

News from MT providers

Google presents a new BERT-like score for multilingual embeddings, called LaBSE.

It produces language-agnostic cross-lingual sentence embeddings for 109 languages. The model is trained on 17 billion monolingual sentences and 6 billion bilingual sentence pairs using MLM and TLM pre-training, resulting in a model that is effective even on low-resource languages for which there is no data available during training. Further, the model establishes a new state of the art on multiple parallel text retrieval tasks.

Systran has been acquired by a Korean institutional investor — a move that proves the need for high-quality localization is only going to grow. Also, Systran launches its public cloud for PNMT, available via Intento at 11 USD / 1M characters. Over the last couple of months, they added five new language pairs: Spanish → Russian, Ukrainian → Russian, Chinese → German, Chinese→Spanish, and Chinese → Thai.

Great to see a new addition to the MT family from Belarus — welcome to a new MT platform Lingvanex!

Promt has added three new language pairs: German ↔ Arabic, German ↔ Turkish, German ↔ Chinese.

IBM added some new language pairs as well: Bosnian ↔ English, Serbian (Cyrillic) ↔ English, Welsh ↔ English.

New languages added by Microsoft:

  • kmr — Northern Kurdish — Kurmanji
  • ku — KurdishKurdî, کوردی
  • or — Oriya — ଓଡ଼ିଆ
  • prs — Dari (Persian) — فارسی
  • ps — Pashto, Pushto — پښتو

Essential reading and watching

  • A deep learning system CUBBITT (a new transformer-based deep-learning system) reaches news translation quality comparable to human professionals, reports Nature Communications in a paper submitted by authors include Google’s Łukasz Kaiser, Jakob Uszkoreit of Google Brain Berlin, and Ondřej Bojar at Charles University in Prague.
  • Microsoft Research published a paper on very deep Transformer models for Neural Machine Translation (NMT). Their models have 60 encoder layers and 12 decoder layers, which is impressive compared to common NMT. These models are reported to achieve state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU) and WMT14 English-German (30.1 BLEU).
  • A high-quality multilingual dataset for structured documentation translation — a project aimed at improving the use of machine translation for localization industry based on Salesforce online help dataset.
  • China bans the export of certain language technologies. On the list are a number of technologies such as speech-related technology around corpus design, recording, annotation, and extraction; text prediction.
  • Is this a new branch in machine translation? Neural machine translation without embeddings from Facebook AI research.
  • NVIDIA launched new graphic cards that we predict will be great for neural networks training.
  • If you’re looking for a weekend video, how about AI and the future of humanity? Extreme scaling: what happens if we really take it to the limit? A talk by the CEO of Allen Institute of AI.

Happy Translating!

Intento Team

--

--