Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences

Making Computers Translate

So how do we program a computer to translate human language?

We just replace each Spanish word with the matching English word.

Making Computers Translate Better Using Statistics

After the failure of rule-based systems, new translation approaches were developed using models based on probability and statistics instead of grammar rules.

Training data is usually exciting! But this is just millions and millions of lines of dry government documents…

Thinking in Probabilities

The fundamental difference with statistical translation systems is that they don’t try to generate one exact translation. Instead, they generate thousands of possible translations and then they rank those translations by likely each is to be correct. They estimate how “correct” something is by how similar it is to the training data. Here’s how it works:

Step 1: Break original sentence into chunks

First, we break up our sentence into simple chunks that can each be easily translated:

Step 2: Find all possible translations for each chunk

Next, we will translate each of these chunks by finding all the ways humans have translated those same chunks of words in our training data.

Even the most common phrases have lots of possible translations.

Step 3: Generate all possible sentences and find the most likely one

Next, we will use every possible combination of these chunks to generate a bunch of possible sentences.

Statistical Machine Translation was a Huge Milestone

Statistical machine translation systems perform much better than rule-based systems if you give them enough training data. Franz Josef Och improved on these ideas and used them to build Google Translate in the early 2000s. Machine Translation was finally available to the world.

The Limitations of Statistical Machine Translation

Statistical machine translation systems work well, but they are complicated to build and maintain. Every new pair of languages you want to translate requires experts to tweak and tune a new multi-step translation pipeline.

Making Computers Translate Better — Without all those Expensive People

The holy grail of machine translation is a black box system that learns how to translate by itself— just by looking at training data. With Statistical Machine Translation, humans are still needed to build and tweak the multi-step statistical models.

Recurrent Neural Networks

We’ve already talked about recurrent neural networks in Part 2, but let’s quickly review.

Humans hate him: 1 weird trick that makes machines smarter!
This is one way you could implement “autocorrect” for a smart phone’s keyboard app


The other idea we need to review is Encodings. We talked about encodings in Part 4 as part of face recognition. To explain encodings, let’s take a slight detour into how we can tell two different people apart with a computer.

I love this dumb gif from CSI so much that I’ll use it again — because it is somehow manages to demonstrate this idea clearly while also being total nonsense.
These facial feature measurements are generated by a neural net that was trained to make sure different people’s faces resulted in different numbers.
This list of numbers represents the English sentence “Machine Learning is Fun!”. A different sentence would be represented by a different set of numbers.
Because the RNN has a “memory” of each word that passed through it, the final encoding it calculates represents all the words in the sentence.

Let’s Translate!

Ok, so we know how to use an RNN to encode a sentence into a set of unique numbers. How does that help us? Here’s where things get really cool!

  • This approach is mostly limited by the amount of training data you have and the amount of computer power you can throw at it. Machine learning researchers only invented this two years ago, but it’s already performing as well as statistical machine translation systems that took 20 years to develop.
  • This doesn’t depend on knowing any rules about human language. The algorithm figures out those rules itself. This means you don’t need experts to tune every step of your translation pipeline. The computer does that for you.
  • This approach works for almost any kind of sequence-to-sequence problem! And it turns out that lots of interesting problems are sequence-to-sequence problems. Read on for other cool things you can do!

Building your own Sequence-to-Sequence Translation System

If you want to build your own language translation system, there’s a working demo included with TensorFlow that will translate between English and French. However, this is not for the faint of heart or for those with limited budgets. This technology is still new and very resource intensive. Even if you have a fast computer with a high-end video card, it might take about a month of continuous processing time to train your own language translation system.

The Ridiculous Power of Sequence-to-Sequence Models

So what else can we do with sequence-to-sequence models?

Image from this paper by Andrej Karpathy
Example from



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Geitgey

Adam Geitgey


Interested in computers and machine learning. Likes to write about it.