Deciphering Google Translate’s subconscious

Ido Ben-Shaul
Coinmonks
7 min readAug 4, 2018

--

In today’s era of exponential technological growth, it gets hard trying to even comprehend the enormous leaps all around us. In the field of Machine Learning, this leap seems to be so huge, that the average person doesn’t grasp the capabilities of it, and furthermore, does not understand the process behind it. Ever since the “Renaissance” of deep-learning has begun(around 2014), the core argument against the field is simple — How do we understand the machine’s understanding, or in other words, what does the network learn, and how are we able to monitor it’s behaviors.

It all started when a co-worker came up to me, exhilarated, and said : “Ido, you have to check this out!”. He then opened up the Google Translate website, switched the input language to Somali, and started writing the word “Soi” repeatedly.

At the beginning, the translation was as one might expect- just word “Soi” in English. As far as I know, the word “Soi” does not have a meaning in Somali, and so you might not expect much from the translation to English. What happened next was the kicker, as we kept adding the word “Soi” again and again, the output started looking bizarre - it was spitting out real sentences, and honestly, creepy ones.

Just me messing around, I encourage to play with this, it’s a lot of fun

For the first part, I thought I was being trolled- I thought that a Google engineer was having lot’s of fun with users typing random letters as input, but then I started looking closer. All over the internet, there were examples of gibberish inputs with full-sentences as output, with outputs such as something to come out of books, things that look like they are taken from a .js file, or texts that talk about a soon-to-be apocalypse.

First, a bit of history. Google announced it’s first Machine Translation model way back in 2006, using a statistical model that relies mostly on something called “Phrase-based translation” (PBMT). The aim was simple - to achieve a full sentence/ paragraph translation rather than any model before, which was limited to word-based translation. Along with the hype of new Machine Learning models Google released the following paper in 2016:

This model was light-years more advanced than the previous one, as it was harnessing the power of Deep Neural Networks. This model was called “ Google Neural Machine Translation (GNMT)”.

What is GNMT?

The problem with PBMT is that during the process of splitting up the content into small parts, you lose the true meaning of the overall sentence. The GNMT aims to solve this issue. It is made of two main parts- The Encoder and the the Decoder. Think of the Encoder as taking the input(in the input language) and turning into “machine language”. Similarly, the Decoder is responsible of taking this vector of tokens in “machine language” form, and outputting a translation in the output language.

LSTMs:

A relativly new(old) model that’s been groundbreaking in many fields in Machine Learning has been a variation of the LSTM model. This model is basically a very fast and smart RNN network. For those of us newer to the field: RNN (Recurrent neural network) is a model that is able to deal with a variant size input/output (like in machine-translation) and looks a bit funny compared to the classic FNNs and CNNs. I will not go too much into specifics here, but what LSTM(Long Short Term Memory) aims to conquer is building the next piece of the output using the last piece of input(Short Term), whilst keeping in mind all the knowledge it has acquired so far(Long Term). The given input at a certain time is entered into the network, and helps the network predict the most plausible next outcome.

So how are LSTMs connected to GNMT? Remember the Encoder and Decoder we were talking about? Well, they are basically a bunch of LSTM levels stacked up on top of each other, to give the model the capabilities to understand advanced(deep) concepts, like a human language. Of course, this is a HUGE understatement: the GNMT model is far more complex than I’m making it sound to be, but the main idea is using these nice little nodes. For a further look into the architecture I recommend this great blog post:

Along with this nice looking model, something amazing happened, the real miracle from my point of view. It’s called “Zero Shot Translation”. What this basically means - If I were to train the machine to translate the sentence “I have a white dog” from English to Spanish, and also from Spanish to French, the machine would infer from the training how to translate the sentence from English to French. This might seem like a given, but it’s far from it. What happens is the machine actually develops an understanding of the semantic meaning of the sentence, as opposed to just the words in it. The semantic meaning of the sentence “I have a white dog” is a uniform understanding, but the fact that the machine is able to identify the similarity of it across different languages is remarkable. Taken from:

Figure 2: A t-SNE projection of the embedding of 74 semantically identical sentences translated across all 6 possible directions, yielding a total of 9,978 steps (dots in the image), from the model trained on English↔Japanese and English↔Korean examples. (a) A bird’s-eye view of the embedding, coloring by the index of the semantic sentence. Well-defined clusters each having a single color are apparent. (b) A zoomed in view of one of the clusters with the same coloring. All of the sentences within this cluster are translations of “The stratosphere extends from about 10km to about 50km in altitude.” © The same cluster colored by source language. All three source languages can be seen within this cluster.

https://research.google.com/bigpicture/

We are able to see a graph that portrays this more intuitively. Here, we see sentences in different languages with the same semantic meaning. As shown, the sentences with identical meaning are clustered across the languages. Of course, this is BIG dimensional reduction technique (t-SNE), and these are of course huge dimensional vectors, but this reduction does show us the main intuition: GNMT understands the semantic idea of a sentence.

Why does GNMT supply us bizarre translations to Gibberish?!

As far as this part goes, these are JUST speculations. From what I have read, there seems to be no formal comment from Google on the subject, so the best I can do is try putting the pieces together. The theory we can derive from the success of “Zero Shot Translation” is that we actually have a model that inherently understands semantics. That is, it has figured out, through rigorous training, that things that are of similar meaning should be “close” together in one way or another. Let’s assume WLOG for a second that the input and output of the model are of constant size(this is, of course, far from the case, as sentences in different languages can be of different length, and even in the same language not all sentences are of the same length, but just to get this more intuitively). We will label: The input language:A, the machine “token language”(excuse the informality) L, and the output language B. Then we have a function that, for a sentence s in A, calculates a matching vector in L(through the Encoder), and then calculates the mapping of this vector into the B. If I were to ask you to say how good the mapping is, how would you grade it? You’d probably expect that, obviously, for any Logical sentence in A, the matching sentence in B would have the same meaning. How about the mapping of random letters in A? Would you require it to translate them in some way or another? That is, how would you translate “sd ds fd” to Spanish? not so easy, is it? More importantly, it’s not well-defined. Translation of gibberish sentences is not part of the goal, but since we have a mapping(In the loosest way possible), these sentences will be mapped to some sentence in B.

The outcomes still seem weird

Yes, they seem weird and frankly hilarious a lot of the time. From what I can see- This is not a glitch! it’s not an error. The machine has developed some sense of “Semantic Closeness” in it’s huge, thousand-dimension vector space, where it is able to understand(pretty well, if I might add), the idea behind a sentence. It has not been required to have constraints on how to translate the sentences that do not make any sense, so it has found a “function” that matches it’s criterion. A function that’s able to get a sentence, and translate it in levels that have not yet been seen in Machine Translation, closing in on Human Translation. The point being: If you tell someone to learn to play basketball, you cannot get mad at him for not doing so well in football, especially when his basketball skills are at an expert-level.

Defending Neural Networks

The biggest leap in Neural Networks, or in Machine Learning in general, is rather than finding a regression pattern in which you supply it with the exact parameters you believe it needs in order to learn, you instead give it the ability to learn the concept in the way most “comfortable” to it. Often, in this field it seems that a lot of human-traits are given to the models, but ultimately that’s what we’re trying to accomplish. NN are often hard to understand, and may act strangely in some tasks, but the whole beauty is in the fact that it comes up with these “unique” methods on it’s own, it’s learning.

--

--

Ido Ben-Shaul
Coinmonks

I’m a Researcher at eBay and a PhD. cand. in Applied Math. You can also check out my twitter: https://twitter.com/ml_norms