June 2020: a deep dive into GPT-3, a webinar on Wednesday not to be missed, Intento plugins get a refresh and a comprehensive digest of essential MT news.

Natalia Rice
Intento
7 min readJun 24, 2020

--

Hello!

It looks like the world is getting back to normal, and we hope you personally and your business are keeping well.

A lot has been happening in the MT world, and we’re excited to share the latest with you. Since we haven’t been in touch in a while, this email is longer than usual!

Here’s a short recap:

  • 🧐 A full overview of GPT-3: a long read of the month
  • 🛠 Intento plugins to start using today
  • 🖥 July 24th: Language Intelligence Solutions webinar
  • 🧰 Intento gets a patent
  • 📱 Updates from Google, DeepL, Microsoft, and more
  • 📗 Other essential MT reads

Let’s dive in.

GPT-3: A Deep Dive

Following the paper published by Open AI, Intento CTO provides a deep dive into the GPT-3 family of models.

In this 11-min read, you’ll get a detailed overview of Generative Pretrained Transformer, its history, architecture, approach, applications, and comparison to GPT-2. And most importantly, answer the question: How much does it cost to train your own?

Read the article

Intento Plugins: the newest releases

1. Microsoft Excel

We have published another handy add-in for MS Office, this time for Microsoft Excel.

The first version supports translating cell contents and the whole sheets. Also, you don’t have to bother with the API keys — you can sign in to your Intento account right from the add-in.

As usual, you can pick any of the 20+ MT engines we support or rely on our educated guess via the “Smart Routing” feature. On the screenshot below, it went to DeepL.

Get it from Microsoft AppSource

Here’s a short video to help you get started.

2. Chrome Extension v.0.2.2

Translate your whole website with just one click, using the best-of-breed machine translation. As simple as that!

What’s new:

  • Translates even more websites
  • Renewed list of providers
  • A better user experience

Get version 0.2.2. here

3. Microsoft Outlook v. 2.0.

Intento Translator v. 2.0. boasts the following features:

  • Seamless sign-in/sign-up
  • Languages and provider capabilities are shown for non-signed-up users
  • Brand new “Economy” mode
  • “Auto translation option is now available
  • The subject of a message is now translated

Get your brand new connector via AppSource.

4. MemoQ and Trados

Version 2.2.0 is now available.

Connect your own provider accounts easily with a new Connected Accounts feature. Just add your keys via Intento Console and use in the plugin right away — no need to fill any other credentials.

5. Mulesoft

We have published the first version of our MuleSoft Anypoint Connector, which makes all the world’s best Machine Translation systems available to ServiceNow, Jira, SAP and other stuff on the Salesforce’s Mulesoft integration platform!

It’s not too late to join!

Have you ever wondered how to evaluate your MT ROI? A lot of insights on that subject will be revealed at the upcoming event on June 24th by LT-innovate. Intento CEO Konstantin Savenkov be talking about procurement and deployment of language technologies. See you there!

Intento receives a patent from the USTPO

We’re happy to announce Intento now has a patent for “Intent-based organization of APIs”.

The patent describes a services platform for routing intent-based API requests to the most relevant API.

At the core of it is the idea that the universe of different APIs can be grouped into a hierarchy of intents according to their goals. One intent may be implemented using a mix of APIs. Once they’re tailored to an intent, they look similar to a user. It allows to evaluate those APIs in relation to the intent and use each of them in the context where it performs best.

MT PROVIDER NEWS

DeepL

The company has announced “a quantum leap in translation quality” — the next generation MT technology. As with any MT upgrade, it comes without any prior notice or release notes on what has been changed.

Make sure to check what changed on your data or use our AI Quality Monitoring that tracks changes in AI models you use in production.

DeepL has also launched Japanese, Chinese, and Brazilian Portuguese.

Also, DeepL translator can now be customized with glossaries to achieve better results in specific domains. It does not seem to be available via API yet; let’s wait and pray for the good Developer Experience.

There’s also a new paragraph 7.4 in DeepL Terms of Service that governs the usage of the Training Data — a good sign we’ll see Custom NMT from DeepL soon — exciting!

IBM

IBM Translator added a number of new languages back in February: Latvian, Urdu and Vietnamese.

Later in April, they expanded to another 6: Bengali — বাংলা, Gujarati — ગુજરાતી, Malayalam — മലയാളം, Maltese — Malti, Tamil — தமிழ், Telugu — తెలుగు

In May, another two: Nepali — नेपाली and Sinhala — සිංහල, and Ukrainian — Українська in June.

Google

Google catches up with Microsoft and Yandex on low-resource languages — in February they have added five new languages: Kinyarwanda (Rwanda, 9.8M speakers), Odia (ଓଡ଼ିଆ, India, 33M speakers), Tatar (Татар, Russia, 6.8M speakers), Turkmen(Türkmençe, Turkmenistan, 4M speakers), and Uyghur (ئۇيغۇرچە, Western China, 11M speakers).

In this article, Google Translate shows some recent progress that has been made in translation quality for supported languages, especially for those that are low-resource. Their ideas on using back-translation to generate synthetic training data and multilingual modelling finally got to production. The good news is Intento can switch the routing in our MT HUB in a snap!

Microsoft

Microsoft is well-known for its attention to low-resource languages and making Machine Translation available for underrepresented groups.

Our API documentation monitoring robot detected that Microsoft Translator rolled out support for PiQad scripts. Our Klingon linguist is on vacation at Kronos without email access at the moment, so please check the quality yourself.

Later in April, Microsoft also added support for Gujarati — ગુજરાતી (India, 56M speakers) and Marathi — मराठी (India, 83M speakers). In June, another one, Kazakh — қазақ тілі (Central Asia, 22M speakers)

Promt

PROMT added support for German <> Portuguese most likely without a pivot (check and let us know what do you think!). Also, in May they have added English to Czech, Slovak, Serbian (Latin) and back. In June, they also got Armenian and Georgian.

Systran PNMT

Systran now supports Russian <> Hindi, German <> Czech, Russian <> Czech, Ukrainian <> Czech, and Vietnamese <> Czech.

New MT providers on our radar

Essential Reads

  • Google has published a paper on BLEURT, a better BLEU-like score based on multilingual embeddings (BERT) https://arxiv.org/abs/2004.04696. The code is already on Github https://github.com/google-research/bleurt. The score is pre-trained on the synthetic data (backtranslations with perturbations) and can be further trained on the WMT or any other LQA scores. We will investigate how well it works, and how well it correlates with the Human LQA results we see on real-world industrial MT evaluation projects we run. Stay tuned!
  • It turns out Machine Translation can be utilized to create training corpora for other NLP models in non-English languages. In this paper, they evaluated Turkish and Amazon Translate, and it worked like a charm. https://arxiv.org/abs/2004.14963 This may give a hint for a typical dilemma “should we use MT for chatbots during training or during production?”
  • Facebook AI announced CCMatrix — the largest parallel dataset for training and evaluating MT models https://ai.facebook.com/blog/ccmatrix-a-billion-scale-bitext-data-set-for-training-translation-models/

All the best and stay well,

Intento Team

To sign up for our monthly newsletter, please register at https://inten.to

--

--