Mr. Translator
Published in

Mr. Translator

What is the basic flow of statistical machine translation?

(1) sentence cutting

The key training corpus used in the statistical machine translation system is a bilingual parallel corpus composed of a large number of sentence pairs translated into each other.
No matter what kind of language data it is, the scale of bilingual corpus in reality is always limited (sometimes scarce) and can not cover all language sentences.
In order to fully learn knowledge from the corpus, it is necessary to cut the bilingual sentence pairs in the training corpus into smaller fragments, and use these fragments to match the source language sentences in the process of translation. then the target language fragments corresponding to these source language fragments are spliced into a complete target language sentence.
These fragments can be continuous or discontinuous; they may or may not have linguistic meaning.

(2) word alignment

When making “fine-grained cutting” from bilingual parallel corpus to obtain bilingual fragments, it is necessary to ensure that the source and target language ends of bilingual segments are semantically consistent, which depends on the construction of high-quality lexical-level correspondence between bilingual sentence pairs, which is called word alignment (word alignment).
Word alignment can be in the form of an one-to-many or many-to-one relationship, and sometimes one party can also be empty (that is, there is no corresponding relationship).
Word alignment determines the basic unseparable correspondence between source language sentences and target language sentences in bilingual data, which is the basis of subsequent processing of machine translation.

(3) Translation model.

Based on the results of word alignment in bilingual corpus, fine-grained segmentation of bilingual data can be completed, and basic pieces of translation knowledge can be extracted, which can be used to build a translation model.
The cutting granularity of bilingual fragments can be large or small: shorter bilingual fragments can match more sentences, but because of the limited context information, they are more ambiguous; the translation content of longer fragments is more accurate, but the number of sentences that can be matched will be less.
When the word is used as the basic translation unit, the word-based translation model (word translationmodel); is obtained. When any continuous segment that conforms to the word alignment constraint is used as the basic translation unit, the phrase translation model (phrase- based translationmodel) (Hoehn et al.,2003) is obtained.
In addition, different translation system models can be obtained by cutting according to other information.
For example, when we cut and extract according to the structure of linguistic syntax tree, we will get the linguistic syntactic translation model (translation model) (Zhang et al.,2011); when we cut and extract according to the formal grammar (formal grammar), we will get the translation model based on formal grammar, such as the reverse transcription grammar translation model (Inverse transduction grammar model) (Wu,1997;.
Xiong et al.,2006) and a hierarchical phrase translation model based on synchronous probabilistic context-free grammar (hierarchical phrase- based translation Model) Chiang,2007).

The above translation models generally use frequency-based model training methods for parameter estimation, such as maximum likelihood estimation method EM method, discriminant model training method and so on.

(4) language model and ordering model.

In addition to the translation model, two other important models will be introduced into the model framework of the statistical translator translation system.
One is the language model, whose function is to depict the fluency of the target translation, so that the candidate translation with the highest probability is more in line with the expression habits of the target language, and the other is the ordering model. its function is to depict the word order difference between the source language and the target language and adjust the target language fragments to a reasonable word order that conforms to the grammatical norms of the target language as far as possible.
In many cases, the ordering model is integrated with the translation model, such as the syntactic translation model usually implicitly completes the ordering function.

(5) other auxiliary models

In addition to the above three main models, the statistical interpreter translation system model framework based on logarithmic-linear model can also introduce other characteristic functions or models that can indicate information, such as target sentence length information.
Adjusting the parameter weights between various models and characteristic functions is also important to produce high-quality machine translation results. this process is called parameter training, which generally adjusts the parameter weights of each model iteratively combined with the evaluation index. until the iterative process converges to a certain threshold or completes a predetermined number of iterations.

(6) Decoder

After the establishment of the model framework, the translation system will match various target language phrase fragments in an enumerated way based on the translation model according to the phrase fragments in the source language sentences, and finally merge the target language sentences with the highest probability.
The number of candidate translations formed in this process has an exponential relationship with the sentence length of the source language, which will form a large search space. Current computing devices cannot traverse all combinations of different translation fragments within a feasible time to find the optimal solution.
Therefore, in the process of generating translation candidates, some operations such as pruning and search path merging are used to improve the decoding efficiency, and an appropriate heuristic algorithm is used for approximate solution.
Finally, according to the score of the model, the result with the highest ranking among all the translation candidates is selected as the final translation.
The process of finding the optimal ketone translation candidate is called translation and decoding (decoding),. The algorithm module that performs this process is called translation decoder (decoder).




Interpreter & Dictionary provided by Tencent Cloud & Smart Industries Business Group (CSIG)

Recommended from Medium

How we increased our Machine Learning Pipeline efficiency at Mirrorr

Tensorflow 2: Model validation, regularization, and callbacks

A Graphic Guide to Implementing PPO for Atari Games

Machine Learning: Trying to detect outliers or unusual behavior

🏎 Smaller, faster, cheaper, lighter: Introducing DilBERT, a distilled version of BERT

Face Detection using Haar Cascades — OpenCV-Python

Introducing concept learning to free you from feature engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chier Hu

Chier Hu

More from Medium

Stemming & Lemmatization


Building Classification Model to Detect Fakenews using Bags of words NLP Pipeline.

An overview of alignment models

How BERT works —  a simple view