Microsoft Paper Introduces First GEC System To Reach Human-Level Performance

A new research paper demonstrating the first Automatic Grammatical Error Correction system to reach human-level performance has been published on arXiv by Tao Ge, Furu Wei and Ming Zhou from the Natural Language Computing Group, Microsoft Research Asia. Reaching Human-Level Performance in Automatic Grammatical error Correction: An Empirical Study can be found here:

The paper’s authors have indicated substantial text overlap with Fluency Boost Learning and Inference for Neural Grammatical Error Correction, accepted by ACL 2018.

In recent years Neural sequence-to-sequence (seq2seq) models have come to be considered proven approaches to grammatical error correction (GEC). But most seq2seq models for GEC have flaws, which limits the training scale to only limited error-corrected sentence pairs and to sentences with limited grammatical errors through single-round seq2seq inference — especially if particular errors in a sentence make the context unclear, which can confuse the model’s subsequent error-corrections.

Figure 1: (a) an error-corrected sentence pair; (b) if the sentence becomes slightly different, the model fails to correct it perfectly; © single-round seq2seq inference cannot perfectly correct the sentence, but multi-round inference can.

To address these limitations, the paper proposes a new novel fluency boost learning and inference mechanism based on the seq2seq framework.

For fluency boosting learning, besides the original error-corrected sentence pairs, the new mechanism allows training new error-corrected sentence pairs established by generating less fluent sentences (e.g., from the seq2seq model’s n-best outputs) as additional training instances during subsequent training epochs, which offers the error correction model more training sentences and accordingly helps improving the model’s generalization ability.

The generated error-corrected sentence pairs by pairing the less fluent sentences with their correct sentences during training, as Figure 2(a) shows, are named fluency boost sentence pairs in this paper.

Figure 2: Fluency boost learning and inference: (a) given a training instance (i.e., an error-corrected sentence pair), fluency boost learning establishes multiple fluency boost sentence pairs from the seq2seq’s n-best outputs during training. The fluency boost sentence pairs will be used as training instances in subsequent training epochs, which helps expand the training set and accordingly benefits model learning; (b) fluency boost inference allows an error correction model to correct a sentence incrementally through multi-round seq2seq inference as long as its fluency can be improved.

For model inference, a fluency boost inference mechanism is proposed to correct sentences incrementally with multi-round inference as long as the proposed edits can boost the sentence’s fluency, as Figure 2(b) shows.

The paper also proposes a “round-way correction” approach for the repeatedly edited output prediction and basic fluency boost inference idea that uses two seq2seq models whose decoding orders are left-to-right and right-to-left respectively. This round-way correction results in a significant improvement of recall.

Through experiments, the combination of fluency boost learning and inference with convolutional seq2seq models achieved 75.72 F0.5 on the CoNLL-2014 10 annotation dataset and 62.42 GLEU on the JFLEG test set. As Table 1 shows, this test result makes the GEC system3 the first to reach human-level performance on both GEC benchmarks.

Table 1: Evaluation result analysis for top-performing GEC systems on CoNLL and JFLEG datasets. The results marked with red font exceed the human-level performance.

System outputs for the CoNLL-2014 and JFLEG test sets are available at getao/human-performance-gec.

Author: Chenhui Zhang | Editor: Michael Sarazen

Follow us on Twitter @Synced_Global for more AI updates!

Subscribe to Synced Global AI Weekly to get insightful tech news, reviews and analysis! Click here !