A quick look at Machine Translation with Amazon Translate

Julien Simon
Dec 21, 2017 · 5 min read

Amazon Translate is a new service announced at AWS re:Invent 2017. At the time of writing, it is available in preview. Please consider joining it and sending us feedback!

These guys didn’t have Amazon Translate, did they?

Features

This is what Amazon Translate is capable of right now:

  • Real-time translation based on Deep Learning technology,
  • 12 language pairs (more coming): English-Arabic, English-Chinese (simplified), English-French, English-German, English-Portuguese, English-Spanish.
  • Simplest AWS service ever: only one API :)

Let’s try it on a few examples. Please keep in mind that the service is still in preview and that it’s constantly learning: imperfections will quickly be fixed thanks to customer feedback.

French-English

First paragraph on this article in today’s Le Monde:

aws translate translate-text --source-language-code "fr" --target-language-code "en" --text "Une Catalogne fatiguée a commencé à voter, jeudi 21 décembre, lors d’élections régionales exceptionnelles convoquées par le gouvernement de Madrid dans le cadre de la mise sous tutelle de la région. Fatiguée par trois mois de tensions et par les profondes divisions de la société catalane. Fatiguée mais très mobilisée, car on s’attend à un taux de participation qui pourrait dépasser les 75 %. Les bureaux de vote ont ouvert à 9 heures et fermerons à 20 heures."
{
"TargetLanguageCode": "en",
"TranslatedText": "A tired Catalonia began to vote on Thursday 21 December in the exceptional regional elections convened by the Government of Madrid in the context of the implementation of the region. Tired by three months of tension and by the deep divisions of Catalan society. Tired but highly mobilized, as a participation rate could be expected to exceed 75%. The polling stations opened at 9 a.m. and closed at 20 a.m.",
"SourceLanguageCode": "fr"
}

The date is a little off, but the rest of the translation is fine.

Spanish-English

First paragraph of this article in today’s El Mundo:

aws translate translate-text --source-language-code "es" --target-language-code "en" --text "Las necesidades inmobiliarias y urbanísticas de las sedes corporativas han experimentado una profunda transformación durante los últimos años. Tanto, que en la actualidad lo que buscan las compañías son edificios vanguardistas, desde los puntos de vista tecnológico y medioambiental, y además prefieren que estén situados en localizaciones buenas y provistas de todo tipo de dotaciones, porque eso facilita la captación de talento."
{
"TargetLanguageCode": "en",
"TranslatedText": "The real estate and development needs of the corporate headquarters have undergone a profound transformation over the past few years. So far, what companies are looking for are avant-garde buildings, from the technological and environmental viewpoints, and they also prefer that they are located in good locations and provided with all kinds of gifted, because that facilitates the capture of talent.",
"SourceLanguageCode": "es"
}

I don’t speak Spanish, but the English definitely makes sense to me :)

Chinese-English

First paragraph on this article from the BBC:

aws translate translate-text --source-language-code "zh" --target-language-code "en" --text "近年来中国关于房产税收不收、怎么收的讨论时常出现。周三(12月20日),中国财政部长肖捷在人民日报上发表了文章,明确推进房产税立法和实施,引起广泛讨论。肖捷在文章中称,按照“立法先行、充分授权、分步推进”的原则,推进房地产税立法和实施。对工商业房地产和个人住房按照评估值征收房地产税,适当降低建设、交易环节税费负担。"
{
"TargetLanguageCode": "en",
"TranslatedText": "During the past few years, China has seen a lack of revenue collection and how much of its discussions have taken place in recent years. On Wednesday (December 20), the Minister of Finance of China, Xiao Jie, published an article in the People's Daily, which explicitly advanced the legislation and implementation of property taxes and gave rise to extensive discussions. Xiao Jie said in the article that the legislation and enforcement of real estate tax should be advanced in accordance with the principle of “legislative precedent, full authorization, step-by-step advancement”. The levy of real estate and personal housing for industrial and commercial real estate and personal housing is subject to an assessment of the real estate tax, with appropriate reductions in the cost of construction, trade ring taxes.",
"SourceLanguageCode": "zh"
}

I don’t speak Chinese either. Although some parts of the translation need improvement, the meaning of the article is quite clear: more taxes :-/

Curious about this works? Read on :)

Sequence2Sequence (seq2seq)

seq2seq is a supervised learning Deep Learning algorithm where the input is a sequence of tokens (such as words from a text) and the output generated is another sequence of tokens. It is a popular choice for Machine Translation applications.

As a sidenote, seq2seq is one of the built-algorithms available in Amazon SageMaker . You’ll find a high-level description here.

Months ago, AWS open-sourced a project named Sockeye, a sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet.

Let’s take a look, shall we?

Sockeye

Sockeye includes a nice tutorial on building a model to translate German to English. It relies on the WMT news dataset, a great resource if you want to train other language pairs.

Following the instructions and letting the model train for 5–6 hours on a p2.8xlarge instance, I obtained my first model without any problem. Great job on the tutorial, Sockeye team :)

I threw the actual translation commands into a small shell script. Here are some samples taken from Wikipedia and German news websites.

./translate.sh "Mexiko schätzt Schäden auf mehr als zwei Milliarden Dollar ."
Mexico estimates damage to more than $ 2 billion .
./translate.sh "Chopin zählt zu den bedeutendsten Persönlichkeiten der Musikgeschichte Polens ."
Chopin is one of the most important personalities of Poland 's history .
./translate.sh "Hotelbetreiber müssen künftig nur den Rundfunkbeitrag bezahlen, wenn ihre Zimmer auch eine Empfangsmöglichkeit bieten ."
in the future , hotel operators must pay only the broadcasting fee if their rooms also offer a reception facility .
./translate.sh "Die Parker Solar Probe wird das erste Raumfahrzeug sein, dass tief in die Atmosphäre der Sonne eindringt ."
the Parker Solar Probe is the first space vehicle to penetrate deep into the atmosphere of the sun .

Pretty good results. I’m sure this would be even better with longer training times.

Conclusion

So there you have it. You can either use Amazon Translate as a fully managed translation service, or you can build and train your own with Sockeye. Choice is king!

That’s it for today. Thank you for reading.


“Tosca” was the soundtrack to this post. Please take a few minutes to listen to this. The cry of a painter tortured and sentenced to death by a tyrant (words in Italian and English here). Blood-freezing every single time

Julien Simon

Written by

Hacker. Headbanger. Harley rider. Hunter. https://aws.amazon.com/evangelists/julien-simon/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade