Machine Translation
In this blog, I build a model that functions as part of a machine translation pipeline. The pipeline accepts English text as input and returns the French translation. The goal is to achieve the highest translation accuracy.

Why machine translation required?
As there are many different languages, it’s difficult to learn each language and without knowing each language how we communicate with peoples having different mother tongues.As our world becomes increasingly connected, language translation provides a critical cultural and economic bridge between people from different countries and ethnic groups.
To meet these needs, technology companies are investing heavily in machine translation. This investment and recent advancements in deep learning have yielded major improvements in translation quality.
To translate a corpus of English text to French, we need to build a recurrent neural network (RNN). Before diving into the implementation, let’s first build some intuition of RNNs and why they are useful for NLP tasks.
RNN Overview
RNNs are designed to take sequences of text as inputs or return sequences of text as outputs, or both. They’re called recurrent because the network’s hidden layers have a loop in which the output and cell state from each time step become inputs at the next time step. This recurrence serves as a form of memory. It allows contextual information to flow through the network so that relevant outputs from previous time steps can be applied to network operations at the current time step.
For example, Determining the polarity of a text review we fed the words into RNN as text is a sequence of words and using RNN semantic meaning of words are preserved.
RNN are used for solving time series problems.
RNN Setup
Depending on the use-case,we will set up our RNN to handle inputs and outputs differently. For this project, we’ll use a many-to-many process where the input is a sequence of English words and the output is a sequence of French words .

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN’s state (more on this soon).
From left to right: (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). (2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words). (3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). (4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). (5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). Notice that in every case are no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is fixed and can be applied as many times as we like.
— Andrej Karpathy, The Unreasonable Effectiveness of Recurrent Neural Networks
Let’s get started for our project.
These are the steps that should be followed:-
- Preprocess:- We convert text to sequence of integers.
- Models:- Create models which accepts a sequence of integers as input and returns a probability distribution over possible translations.
- Prediction:- Run the model on English text.
Load data set
The most common datasets used for machine translation are from WMT. However, that will take a long time to train a neural network on. We will be using a data set we created for this project that contains a small vocabulary.

Vocabulary for both English and French language data set.

Pre processing
For this project, we can not use text data as input to your model. Instead, we will convert the text into sequences of integers using the following preprocessing methods:
1. Tokenize the words into ids.
2. Add padding to make all the sequences the same length.
Tokenize
For a neural network to predict on text data, it first has to be turned into data it can understand. Text data like “dog” is a sequence of ASCII character encodings. Since a neural network is a series of multiplication and addition operations, the input data needs to be number(s).
We can turn each character into a number or each word into a number. These are called character and word ids, respectively. Character ids are used for character level models that generate text predictions for each character. A word level model uses word ids that generate text predictions for each word. Word level models tend to learn better, since they are lower in complexity, so we’ll use those.
Turn each sentence into a sequence of words ids using Keras Tokenizer.


Padding
When batching the sequence of word ids together, each sequence needs to be the same length. Since sentences are dynamic in length, we can add padding to the end of the sequences to make them the same length.
Make sure all the English sequences have the same length and all the French sequences have the same length by adding padding to the end of each sequence using Keras (pad_sequences) function.



Models
We will use four simple architecture of RNN for this project.
- Model 1 is a Simple RNN.
- Model 2 is a RNN with Embedding.
- Model 3 is a Bidirectional RNN.
- Model 4 is an Encoder-Decoder RNN.
Ids Back to Text
The neural network will be translating the input to words ids, which is not the final form we want. We want the French translation. The function `logits_to_text` will bridge the gap between the logits from the neural network to the French translation.

Simple RNN:-
A basic RNN model is a good baseline for sequence data. In this model, we will build a RNN that translates English to French.


I got test accuracy of 0.9456 and train accuracy is 0.9736.
Now let’s move to next model.
RNN with Embedding

We have turned the words into ids, but there is a better representation of a word. This is called word embeddings. An embedding is a vector representation of the word that is close to similar words in an n-dimensional space, where the n represents the size of the embedding vectors.
Word embeddings provide a dense representation of words and their relative meanings.Word embeddings can be learned from text data and reused among projects. They can also be learned as part of fitting a neural network on text data.In an embedding, words are represented by dense vectors where a vector represents the projection of the word into a continuous vector space.
The position of a word within the vector space is learned from text and is based on the words that surround the word when it is used.The position of a word in the learned vector space is referred to as its embedding.


I got test accuracy of 0.9632 and train accuracy of 0.9832,which is quite good as compared to simple model.
Bidirectional RNN

One restriction of a RNN is that it can not see the future input, only the past input we can have. This is where bidirectional recurrent neural networks come in. They are able to see the future data.
Bidirectional recurrent neural networks(RNN) are really just putting two independent RNNs together. The input sequence is fed in normal time order for one network, and in reverse time order for another. The outputs of the two networks are usually concatenated at each time step, though there are other options, e.g. summation.
This structure allows the networks to have both backward and forward information about the sequence at every time step.

Let’s see the code snippet.


Encoder-Decoder RNN
The encoder creates a matrix representation of the sentence. The decoder takes this matrix as input and predicts the translation as output.

The Encoder-Decoder architecture with recurrent neural networks has become an effective and standard approach for both neural machine translation (NMT) and sequence-to-sequence (seq2seq) prediction in general.
The key benefits of the approach are the ability to train a single end-to-end model directly on source and target sentences and the ability to handle variable length input and output sequences of text.An Encoder-Decoder architecture was developed where an input sequence was read entirely and encoded to a fixed-length internal representation.
A decoder network then used this internal representation to output words until the end of sequence token was reached. LSTM networks were used for both the encoder and decoder.
Let’s come to code snippet.


With this method i got test accuracy of 0.9356 and train accuracy of 0.9536.
Let’s compare all four models with the help of pretty table.

As, we can see that RNN with embedding giving good test accuracy.
That’s all about machine translation.
Hope you enjoyed.
Cheer’s
